race condition in pg_class

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Smolkin Grigory (#1)

Re: race condition in pg_class

Smolkin Grigory <smallkeen@gmail.com> writes:

We are running PG13.10 and recently we have encountered what appears to be
a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
some other catalog-writer, possibly ANALYZE.
The problem is that after successfully creating index on relation (which
previosly didnt have any indexes), its pg_class.relhasindex remains set to
"false", which is illegal, I think.
Index was built using the following statement:
ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);

ALTER TABLE ADD CONSTRAINT would certainly have taken
AccessExclusiveLock on the "example" table, which should be sufficient
to prevent anything else from touching its pg_class row. The only
mechanism I can think of that might bypass that is a manual UPDATE on
pg_class, which would just manipulate the row as a row without concern
for associated relation-level locks. Any chance that somebody was
doing something like that?

regards, tom lane

Smolkin Grigory

smallkeen@gmail.com

about 2 years ago

In reply to: Tom Lane (#3)

Re: race condition in pg_class

ALTER TABLE ADD CONSTRAINT would certainly have taken
AccessExclusiveLock on the "example" table, which should be sufficient
to prevent anything else from touching its pg_class row. The only
mechanism I can think of that might bypass that is a manual UPDATE on
pg_class, which would just manipulate the row as a row without concern
for associated relation-level locks. Any chance that somebody was
doing something like that?

No chance. Our infrastructure dont do that, and users dont just have the
privileges to mess with pg_catalog.

ср, 25 окт. 2023 г. в 21:06, Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Smolkin Grigory <smallkeen@gmail.com> writes:

We are running PG13.10 and recently we have encountered what appears to

be

a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT

and

some other catalog-writer, possibly ANALYZE.
The problem is that after successfully creating index on relation (which
previosly didnt have any indexes), its pg_class.relhasindex remains set

to

"false", which is illegal, I think.
Index was built using the following statement:
ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);

ALTER TABLE ADD CONSTRAINT would certainly have taken
AccessExclusiveLock on the "example" table, which should be sufficient
to prevent anything else from touching its pg_class row. The only
mechanism I can think of that might bypass that is a manual UPDATE on
pg_class, which would just manipulate the row as a row without concern
for associated relation-level locks. Any chance that somebody was
doing something like that?

regards, tom lane

noah@leadboat.com

about 2 years ago

In reply to: Smolkin Grigory (#1)

Re: race condition in pg_class

On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote:

We are running PG13.10 and recently we have encountered what appears to be
a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
some other catalog-writer, possibly ANALYZE.
The problem is that after successfully creating index on relation (which
previosly didnt have any indexes), its pg_class.relhasindex remains set to
"false", which is illegal, I think.
Index was built using the following statement:
ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);

This is going to be a problem with any operation that does a transactional
pg_class update without taking a lock that conflicts with ShareLock. GRANT
doesn't lock the table at all, so I can reproduce this in v17 as follows:

== session 1
create table t (c int);
begin;
grant select on t to public;

== session 2
alter table t add primary key (c);

== back in session 1
commit;

We'll likely need to change how we maintain relhasindex or perhaps take a lock
in GRANT.

Looking into the WAL via waldump given us the following picture (full
waldump output is attached):

1202295045 - create index statement
1202298790 and 1202298791 are some other concurrent operations,
unfortunately I wasnt able to determine what are they

Can you explore that as follows?

- PITR to just before the COMMIT record.
- Save all rows of pg_class.
- PITR to just after the COMMIT record.
- Save all rows of pg_class.
- Diff the two sets of saved rows.

Which columns changed? The evidence you've shown would be consistent with a
transaction doing GRANT or REVOKE on dozens of tables. If the changed column
is something other than relacl, that would be great to know.

On the off-chance it's relevant, what extensions do you have (\dx in psql)?

Smolkin Grigory

smallkeen@gmail.com

about 2 years ago

In reply to: Noah Misch (#5)

Re: race condition in pg_class

This is going to be a problem with any operation that does a transactional
pg_class update without taking a lock that conflicts with ShareLock.

GRANT

doesn't lock the table at all, so I can reproduce this in v17 as follows:

== session 1
create table t (c int);
begin;
grant select on t to public;

== session 2
alter table t add primary key (c);

== back in session 1
commit;

We'll likely need to change how we maintain relhasindex or perhaps take a

lock

in GRANT.

Oh, that explains it. Thank you very much.

Can you explore that as follows?

- PITR to just before the COMMIT record.
- Save all rows of pg_class.
- PITR to just after the COMMIT record.
- Save all rows of pg_class.
- Diff the two sets of saved rows.

Sure, but it will take some time, its a large db with lots of WAL segments
to apply.

extensions

noah@leadboat.com

about 2 years ago

In reply to: Noah Misch (#5)

Re: race condition in pg_class

On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:

On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote:

We are running PG13.10 and recently we have encountered what appears to be
a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
some other catalog-writer, possibly ANALYZE.
The problem is that after successfully creating index on relation (which
previosly didnt have any indexes), its pg_class.relhasindex remains set to
"false", which is illegal, I think.

It's damaging. The table will behave like it has no indexes. If something
adds an index later, old indexes will reappear, corrupt, having not received
updates during the relhasindex=false era. ("pg_amcheck --heapallindexed" can
detect this.)

Index was built using the following statement:
ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);

This is going to be a problem with any operation that does a transactional
pg_class update without taking a lock that conflicts with ShareLock. GRANT
doesn't lock the table at all, so I can reproduce this in v17 as follows:

== session 1
create table t (c int);
begin;
grant select on t to public;

== session 2
alter table t add primary key (c);

== back in session 1
commit;

We'll likely need to change how we maintain relhasindex or perhaps take a lock
in GRANT.

The main choice is accepting more DDL blocking vs. accepting inefficient
relcache builds. Options I'm seeing:

=== "more DDL blocking" option family

B1. Take ShareUpdateExclusiveLock in GRANT, REVOKE, and anything that makes
transactional pg_class updates without holding some stronger lock. New
asserts could catch future commands failing to do this.

B2. Take some shorter-lived lock around pg_class tuple formation, such that
GRANT blocks CREATE INDEX, but two CREATE INDEX don't block each other.
Anything performing a transactional update of a pg_class row would acquire
the lock in exclusive mode before fetching the old tuple and hold it till
end of transaction. relhasindex=true in-place updates would acquire it
the same way, but they would release it after the inplace update. I
expect a new heavyweight lock type, e.g. LOCKTAG_RELATION_DEFINITION, with
the same key as LOCKTAG_RELATION. This has less blocking than the
previous option, but it's more complicated to explain to both users and
developers.

B3. Have CREATE INDEX do an EvalPlanQual()-style thing to update all successor
tuple versions. Like the previous option, this would require new locking,
but the new lock would not need to persist till end of xact. It would be
even more complicated to explain to users and developers. (If this is
promising enough to warrant more detail, let me know.)

B4. Use transactional updates to set relhasindex=true. Two CREATE INDEX
commands on the same table would block each other. If we did it the way
most DDL does today, they'd get "tuple concurrently updated" failures
after the blocking ends.

=== "inefficient relcache builds" option family

R1. Ignore relhasindex; possibly remove it in v17. Relcache builds et
al. will issue more superfluous queries.

R2. As a weird variant of the previous option, keep relhasindex and make all
transactional updates of pg_class set relhasindex=true pessimistically.
(VACUUM will set it back to false.)

=== other

O1. This is another case where the sometimes-discussed "pg_class_nt" for
nontransactional columns would help. I'm ruling that out as too hard to
back-patch.

Are there other options important to consider? I currently like (B1) the
most, followed closely by (R1) and (B2). A key unknown is the prevalence of
index-free tables. Low prevalence would argue in favor of (R1). In my
limited experience, they've been rare. That said, I assume relcache builds
happen a lot more than GRANTs, so it's harder to bound the damage from (R1)
compared to the damage from (B1). Thoughts on this decision?

Thanks,
nm

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Noah Misch (#7)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:

We'll likely need to change how we maintain relhasindex or perhaps take a lock
in GRANT.

The main choice is accepting more DDL blocking vs. accepting inefficient
relcache builds. Options I'm seeing:

It looks to me like you're only thinking about relhasindex, but it
seems to me that any call of heap_inplace_update brings some
risk of this kind. Excluding the bootstrap-mode-only usage in
create_toast_table, I see four callers:

* index_update_stats updating a pg_class tuple's
relhasindex, relpages, reltuples, relallvisible

* vac_update_relstats updating a pg_class tuple's
relpages, reltuples, relallvisible, relhasindex, relhasrules,
relhastriggers, relfrozenxid, relminmxid

* vac_update_datfrozenxid updating a pg_database tuple's
datfrozenxid, datminmxid

* dropdb updating a pg_database tuple's datconnlimit

So we have just as much of a problem with GRANTs on databases
as GRANTs on relations. Also, it looks like we can lose
knowledge of the presence of rules and triggers, which seems
nearly as bad as forgetting about indexes. The rest of these
updates might not be correctness-critical, although I wonder
how bollixed things could get if we forget an advancement of
relfrozenxid or datfrozenxid (especially if the calling
transaction goes on to make other changes that assume that
the update happened).

BTW, vac_update_datfrozenxid believes (correctly I think) that
it cannot use the syscache copy of a tuple as the basis for in-place
update, because syscache will have detoasted any toastable fields.
These other callers are ignoring that, which seems like it should
result in heap_inplace_update failing with "wrong tuple length".
I wonder how come we're not seeing reports of that from the field.

I'm inclined to propose that heap_inplace_update should check to
make sure that it's operating on the latest version of the tuple
(including, I guess, waiting for an uncommitted update?) and throw
error if not. I think this is what your B3 option is, but maybe
I misinterpreted. It might be better to throw error immediately
instead of waiting to see if the other updater commits.

regards, tom lane

noah@leadboat.com

about 2 years ago

In reply to: Tom Lane (#8)

Re: race condition in pg_class

On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:

We'll likely need to change how we maintain relhasindex or perhaps take a lock
in GRANT.

The main choice is accepting more DDL blocking vs. accepting inefficient
relcache builds. Options I'm seeing:

It looks to me like you're only thinking about relhasindex, but it
seems to me that any call of heap_inplace_update brings some
risk of this kind. Excluding the bootstrap-mode-only usage in
create_toast_table, I see four callers:

* index_update_stats updating a pg_class tuple's
relhasindex, relpages, reltuples, relallvisible

* vac_update_relstats updating a pg_class tuple's
relpages, reltuples, relallvisible, relhasindex, relhasrules,
relhastriggers, relfrozenxid, relminmxid

* vac_update_datfrozenxid updating a pg_database tuple's
datfrozenxid, datminmxid

* dropdb updating a pg_database tuple's datconnlimit

So we have just as much of a problem with GRANTs on databases
as GRANTs on relations. Also, it looks like we can lose
knowledge of the presence of rules and triggers, which seems
nearly as bad as forgetting about indexes. The rest of these
updates might not be correctness-critical, although I wonder
how bollixed things could get if we forget an advancement of
relfrozenxid or datfrozenxid (especially if the calling
transaction goes on to make other changes that assume that
the update happened).

Thanks for researching that. Let's treat frozenxid stuff as critical; I
wouldn't want to advance XID limits based on a datfrozenxid that later gets
rolled back. I agree relhasrules and relhastriggers are also critical. The
"inefficient relcache builds" option family can't solve cases like
relfrozenxid and datconnlimit, so that leaves us with the "more DDL blocking"
option family.

BTW, vac_update_datfrozenxid believes (correctly I think) that
it cannot use the syscache copy of a tuple as the basis for in-place
update, because syscache will have detoasted any toastable fields.
These other callers are ignoring that, which seems like it should
result in heap_inplace_update failing with "wrong tuple length".
I wonder how come we're not seeing reports of that from the field.

Good question. Perhaps we'll need some test cases that exercise each inplace
update against a row having a toast pointer. It's too easy to go a long time
without encountering those in the field.

I'm inclined to propose that heap_inplace_update should check to
make sure that it's operating on the latest version of the tuple
(including, I guess, waiting for an uncommitted update?) and throw
error if not. I think this is what your B3 option is, but maybe
I misinterpreted. It might be better to throw error immediately
instead of waiting to see if the other updater commits.

That's perhaps closer to B2. To be pedantic, B3 was about not failing or
waiting for GRANT to commit but instead inplace-updating every member of the
update chain. For B2, I was thinking we don't need to error. There are two
problematic orders of events. The easy one is heap_inplace_update() mutating
a tuple that already has an xmax. That's the one in the test case upthread,
and detecting it is trivial. The harder one is heap_inplace_update() mutating
a tuple after GRANT fetches the old tuple, before GRANT enters heap_update().
I anticipate a new locktag per catalog that can receive inplace updates,
i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION. Here's a
walk-through for the pg_database case. GRANT will use the following sequence
of events:

- acquire LOCKTAG_DATABASE_DEFINITION in exclusive mode
- fetch latest pg_database tuple
- heap_update()
- COMMIT, releasing LOCKTAG_DATABASE_DEFINITION

vac_update_datfrozenxid() sequence of events:

- acquire LOCKTAG_DATABASE_DEFINITION in exclusive mode
- (now, all GRANTs on the given database have committed or aborted)
- fetch latest pg_database tuple
- heap_inplace_update()
- release LOCKTAG_DATABASE_DEFINITION, even if xact not ending
- continue with other steps, e.g. vac_truncate_clog()

How does that compare to what you envisioned? vac_update_datfrozenxid() could
further use xmax as a best-efforts thing to catch conflict with manual UPDATE
statements, but it wouldn't solve the case where the UPDATE had fetched the
tuple but not yet heap_update()'d it.

#10

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Noah Misch (#9)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:

I'm inclined to propose that heap_inplace_update should check to
make sure that it's operating on the latest version of the tuple
(including, I guess, waiting for an uncommitted update?) and throw
error if not. I think this is what your B3 option is, but maybe
I misinterpreted. It might be better to throw error immediately
instead of waiting to see if the other updater commits.

That's perhaps closer to B2. To be pedantic, B3 was about not failing or
waiting for GRANT to commit but instead inplace-updating every member of the
update chain. For B2, I was thinking we don't need to error. There are two
problematic orders of events. The easy one is heap_inplace_update() mutating
a tuple that already has an xmax. That's the one in the test case upthread,
and detecting it is trivial. The harder one is heap_inplace_update() mutating
a tuple after GRANT fetches the old tuple, before GRANT enters heap_update().

Ugh ... you're right, what I was imagining would not catch that last case.

I anticipate a new locktag per catalog that can receive inplace updates,
i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.

We could perhaps make this work by using the existing tuple-lock
infrastructure, rather than inventing new locktags (a choice that
spills to a lot of places including clients that examine pg_locks).

I would prefer though to find a solution that only depends on making
heap_inplace_update protect itself, without high-level cooperation
from the possibly-interfering updater. This is basically because
I'm still afraid that we're defining the problem too narrowly.
For one thing, I have nearly zero confidence that GRANT et al are
the only problematic source of conflicting transactional updates.
For another, I'm worried that some extension may be using
heap_inplace_update against a catalog we're not considering here.
I'd also like to find a solution that fixes the case of a conflicting
manual UPDATE (although certainly that's a stretch goal we may not be
able to reach).

I wonder if there's a way for heap_inplace_update to mark the tuple
header as just-updated in a way that regular heap_update could
recognize. (For standard catalog updates, we'd then end up erroring
in simple_heap_update, which I think is fine.) We can't update xmin,
because the VACUUM callers don't have an XID; but maybe there's some
other way? I'm speculating about putting a funny value into xmax,
or something like that, and having heap_update check that what it
sees in xmax matches what was in the tuple the update started with.

Or we could try to get rid of in-place updates, but that seems like
a mighty big lift. All of the existing callers, except maybe
the johnny-come-lately dropdb usage, have solid documented reasons
to do it that way.

regards, tom lane

#11

noah@leadboat.com

about 2 years ago

In reply to: Tom Lane (#10)

Re: race condition in pg_class

On Fri, Oct 27, 2023 at 06:40:55PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:

I'm inclined to propose that heap_inplace_update should check to
make sure that it's operating on the latest version of the tuple
(including, I guess, waiting for an uncommitted update?) and throw
error if not. I think this is what your B3 option is, but maybe
I misinterpreted. It might be better to throw error immediately
instead of waiting to see if the other updater commits.

That's perhaps closer to B2. To be pedantic, B3 was about not failing or
waiting for GRANT to commit but instead inplace-updating every member of the
update chain. For B2, I was thinking we don't need to error. There are two
problematic orders of events. The easy one is heap_inplace_update() mutating
a tuple that already has an xmax. That's the one in the test case upthread,
and detecting it is trivial. The harder one is heap_inplace_update() mutating
a tuple after GRANT fetches the old tuple, before GRANT enters heap_update().

Ugh ... you're right, what I was imagining would not catch that last case.

I anticipate a new locktag per catalog that can receive inplace updates,
i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.

We could perhaps make this work by using the existing tuple-lock
infrastructure, rather than inventing new locktags (a choice that
spills to a lot of places including clients that examine pg_locks).

That could be okay. It would be weird to reuse a short-term lock like that
one as something held till end of transaction. But the alternative of new
locktags ain't perfect, as you say.

I would prefer though to find a solution that only depends on making
heap_inplace_update protect itself, without high-level cooperation
from the possibly-interfering updater. This is basically because
I'm still afraid that we're defining the problem too narrowly.
For one thing, I have nearly zero confidence that GRANT et al are
the only problematic source of conflicting transactional updates.

Likewise here, but I have fair confidence that an assertion would flush out
the rest. heap_inplace_update() would assert that the backend holds one of
the acceptable locks. It could even be an elog; heap_inplace_update() can
tolerate that cost.

For another, I'm worried that some extension may be using
heap_inplace_update against a catalog we're not considering here.

A pgxn search finds "citus" using heap_inplace_update().

I'd also like to find a solution that fixes the case of a conflicting
manual UPDATE (although certainly that's a stretch goal we may not be
able to reach).

It would be nice.

I wonder if there's a way for heap_inplace_update to mark the tuple
header as just-updated in a way that regular heap_update could
recognize. (For standard catalog updates, we'd then end up erroring
in simple_heap_update, which I think is fine.) We can't update xmin,
because the VACUUM callers don't have an XID; but maybe there's some
other way? I'm speculating about putting a funny value into xmax,
or something like that, and having heap_update check that what it
sees in xmax matches what was in the tuple the update started with.

Hmmm. Achieving it without an XID would be the real trick. (With an XID, we
could use xl_heap_lock like heap_update() does.) Thinking out loud, what if
heap_inplace_update() sets HEAP_XMAX_INVALID and xmax =
TransactionIdAdvance(xmax)? Or change t_ctid in a similar way. Then regular
heap_update() could complain if the field changed vs. last seen value. This
feels like something to regret later in terms of limiting our ability to
harness those fields for more-valuable ends or compact them away in a future
page format. I can't pinpoint a specific loss, so the idea might have legs.
Nontransactional data in separate tables or in new metapages smells like the
right long-term state. A project wanting to reuse the tuple header bits could
introduce such storage to unblock its own bit reuse.

Or we could try to get rid of in-place updates, but that seems like
a mighty big lift. All of the existing callers, except maybe
the johnny-come-lately dropdb usage, have solid documented reasons
to do it that way.

Yes, removing that smells problematic.

#12

noah@leadboat.com

about 2 years ago

In reply to: Noah Misch (#11)

2 attachment(s)

Re: race condition in pg_class

I prototyped two ways, one with a special t_ctid and one with LockTuple().

On Fri, Oct 27, 2023 at 04:26:12PM -0700, Noah Misch wrote:

On Fri, Oct 27, 2023 at 06:40:55PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:

I anticipate a new locktag per catalog that can receive inplace updates,
i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.

We could perhaps make this work by using the existing tuple-lock
infrastructure, rather than inventing new locktags (a choice that
spills to a lot of places including clients that examine pg_locks).

That could be okay. It would be weird to reuse a short-term lock like that
one as something held till end of transaction. But the alternative of new
locktags ain't perfect, as you say.

That worked.

I would prefer though to find a solution that only depends on making
heap_inplace_update protect itself, without high-level cooperation
from the possibly-interfering updater. This is basically because
I'm still afraid that we're defining the problem too narrowly.
For one thing, I have nearly zero confidence that GRANT et al are
the only problematic source of conflicting transactional updates.

Likewise here, but I have fair confidence that an assertion would flush out
the rest. heap_inplace_update() would assert that the backend holds one of
the acceptable locks. It could even be an elog; heap_inplace_update() can
tolerate that cost.

That check would fall in both heap_inplace_update() and heap_update(). After
all, a heap_inplace_update() check won't detect an omission in GRANT.

For another, I'm worried that some extension may be using
heap_inplace_update against a catalog we're not considering here.

A pgxn search finds "citus" using heap_inplace_update().

I'd also like to find a solution that fixes the case of a conflicting
manual UPDATE (although certainly that's a stretch goal we may not be
able to reach).

It would be nice.

I expect most approaches could get there by having ExecModifyTable() arrange
for the expected locking or other actions. That's analogous to how
heap_update() takes care of sinval even for a manual UPDATE.

I wonder if there's a way for heap_inplace_update to mark the tuple
header as just-updated in a way that regular heap_update could
recognize. (For standard catalog updates, we'd then end up erroring
in simple_heap_update, which I think is fine.) We can't update xmin,
because the VACUUM callers don't have an XID; but maybe there's some
other way? I'm speculating about putting a funny value into xmax,
or something like that, and having heap_update check that what it
sees in xmax matches what was in the tuple the update started with.

Hmmm. Achieving it without an XID would be the real trick. (With an XID, we
could use xl_heap_lock like heap_update() does.) Thinking out loud, what if
heap_inplace_update() sets HEAP_XMAX_INVALID and xmax =
TransactionIdAdvance(xmax)? Or change t_ctid in a similar way. Then regular
heap_update() could complain if the field changed vs. last seen value. This
feels like something to regret later in terms of limiting our ability to
harness those fields for more-valuable ends or compact them away in a future
page format. I can't pinpoint a specific loss, so the idea might have legs.
Nontransactional data in separate tables or in new metapages smells like the
right long-term state. A project wanting to reuse the tuple header bits could
introduce such storage to unblock its own bit reuse.

heap_update() does not have the pre-modification xmax today, so I used t_ctid.
heap_modify_tuple() preserves t_ctid, so heap_update() already has the
pre-modification t_ctid in key cases. For details of how the prototype uses
t_ctid, see comment at "#define InplaceCanaryOffsetNumber". The prototype
doesn't prevent corruption in the following scenario, because the aborted
ALTER TABLE RENAME overwrites the special t_ctid:

== session 1
drop table t;
create table t (c int);
begin;
-- in gdb, set breakpoint on heap_modify_tuple
grant select on t to public;

== session 2
alter table t add primary key (c);
begin; alter table t rename to t2; rollback;

== back in session 1
-- release breakpoint
-- want error (would get it w/o the begin;alter;rollback)
commit;

I'm missing how to mark the tuple in a fashion accessible to a second
heap_update() after a rolled-back heap_update(). The mark needs enough bits
"N" so it's implausible for 2^N inplace updates to happen between GRANT
fetching the old tuple and GRANT completing heap_update(). Looking for bits
that could persist across a rolled-back heap_update(), we have 3 in t_ctid, 2
in t_infomask2, and 0 in xmax. I definitely don't want to paint us into a
corner by spending the t_infomask2 bits on this. Even if I did, 2^(3+2)=32
wouldn't clearly be enough inplace updates.

Is there a way to salvage the goal of fixing the bug without modifying code
like ExecGrant_common()? If not, I'm inclined to pick from one of the
following designs:

- Acquire ShareUpdateExclusiveLock in GRANT ((B1) from previous list). It
does make GRANT more intrusive; e.g. GRANT will cancel autovacuum. I'm
leaning toward this one for two reasons. First, it doesn't slow
heap_update() outside of assert builds. Second, it makes the GRANT
experience more like the rest our DDL, in that concurrent DDL will make
GRANT block, not fail.

- GRANT passes to heapam the fixed-size portion of the pre-modification tuple.
heap_update() compares those bytes to the oldtup in shared buffers to see if
an inplace update happened. (HEAD could get the bytes from a new
heap_update() parameter, while back branches would need a different passing
approach.)

- LockTuple(), as seen in its attached prototype. I like this least at the
moment, because it changes pg_locks content without having a clear advantage
over the previous option. Also, the prototype has enough FIXME markers that
I expect this to get hairy before it's done.

I might change my preference after further prototypes. Does anyone have a
strong preference between those? Briefly, I did consider these additional
alternatives:

- Just accept the yet-rarer chance of corruption from this message's test
procedure.

- Hold a buffer lock long enough to solve things.

- Remember the tuples where we overwrote a special t_ctid, and reverse the
overwrite during abort processing. But I/O in the abort path sounds bad.

Thanks,
nm

Attachments:

intra-grant-inplace-via-ctid-v0.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de815..c0c33e9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2801,6 +2801,10 @@ l1:
 	HeapTupleHeaderSetXmax(tp.t_data, new_xmax);
 	HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
 	/* Make sure there is no forward chain link in t_ctid */
+	/*
+	 * FIXME this will overwrite any InplaceCanary ctid.  Leave c_ctid
+	 * unchanged?  Accept that a rolled-back DROP could undo the protection?
+	 */
 	tp.t_data->t_ctid = tp.t_self;
 
 	/* Signal that this is actually a move into another partition */
@@ -3348,6 +3352,8 @@ l2:
 		if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
 		{
 			result = TM_Updated;
+			/* FiXME will InplaceCanary mechanism have race conditions w/ this
+			 * or other t_ctid asserts in this file? */
 			Assert(!ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid));
 		}
 	}
@@ -3730,6 +3736,13 @@ l2:
 										   id_has_external,
 										   &old_key_copied);
 
+	/* FIXME missing anything important via the IsValid tests? */
+	if (ItemPointerIsValid(&oldtup.t_data->t_ctid) &&
+		ItemPointerIsValid(&newtup->t_data->t_ctid) &&
+		ItemPointerGetOffsetNumber(&oldtup.t_data->t_ctid) == InplaceCanaryOffsetNumber &&
+		!ItemPointerEquals(&oldtup.t_data->t_ctid, &newtup->t_data->t_ctid))
+		elog(ERROR, "tuple concurrently updated");
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
@@ -4747,6 +4760,11 @@ failed:
 	 * updated, we need to follow the update chain to lock the new versions of
 	 * the tuple as well.
 	 */
+	/*
+	 * FIXME this will overwrite any InplaceCanary ctid.  Leave c_ctid
+	 * unchanged?  Accept that SELECT FOR UPDATE on the catalog table could
+	 * undo the protection?
+	 */
 	if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
 		tuple->t_data->t_ctid = *tid;
 
@@ -5892,6 +5910,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	HeapTupleHeader htup;
 	uint32		oldlen;
 	uint32		newlen;
+	HeapTupleData oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -5922,12 +5941,29 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
 
+	/* evaluate whether updated */
+	/* FIXME wait for a concurrent updater, in case it aborts */
+	oldtup.t_tableOid = RelationGetRelid(relation);
+	oldtup.t_data = htup;
+	oldtup.t_len = ItemIdGetLength(lp);
+	oldtup.t_self = tuple->t_self;
+	if (TM_Ok !=
+		HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false), buffer))
+		elog(ERROR, "tuple concurrently updated");
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
 	memcpy((char *) htup + htup->t_hoff,
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
+	if (ItemPointerGetOffsetNumber(&htup->t_ctid) ==
+		InplaceCanaryOffsetNumber)
+		BlockIdRetreat(&htup->t_ctid.ip_blkid);
+	else
+		ItemPointerSet(&htup->t_ctid,
+					   MaxBlockNumber,
+					   InplaceCanaryOffsetNumber);
 
 	MarkBufferDirty(buffer);
 
@@ -5945,6 +5981,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
 		XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);
 
+		/* don't log t_ctid: concurrent changes not happening in recovery */
 		/* inplace updates aren't decoded atm, don't log the origin */
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
diff --git a/src/include/storage/block.h b/src/include/storage/block.h
index 31a036d..f8d9157 100644
--- a/src/include/storage/block.h
+++ b/src/include/storage/block.h
@@ -105,4 +105,14 @@ BlockIdGetBlockNumber(const BlockIdData *blockId)
 	return (((BlockNumber) blockId->bi_hi) << 16) | ((BlockNumber) blockId->bi_lo);
 }
 
+/* wraps on underflow, avoids InvalidBlockNumber */
+static inline void
+BlockIdRetreat(BlockIdData *blockId)
+{
+	BlockNumber proposal = BlockIdGetBlockNumber(blockId) - 1;
+	if (proposal == InvalidBlockNumber)
+		proposal = MaxBlockNumber;
+	BlockIdSet(blockId, proposal);
+}
+
 #endif							/* BLOCK_H */
diff --git a/src/include/storage/off.h b/src/include/storage/off.h
index 3540308..8ff776a 100644
--- a/src/include/storage/off.h
+++ b/src/include/storage/off.h
@@ -27,6 +27,19 @@ typedef uint16 OffsetNumber;
 #define FirstOffsetNumber		((OffsetNumber) 1)
 #define MaxOffsetNumber			((OffsetNumber) (BLCKSZ / sizeof(ItemIdData)))
 
+/*
+ * If a t_ctid contains InplaceCanaryOffsetNumber, it's a special ctid
+ * signifying that the tuple received a heap_inplace_update().  This is
+ * expected only in system catalogs, though extensions can use it elsewhere.
+ * (Offsets greater than MaxOffsetNumber are otherwise unattested.)  The
+ * BlockNumber acts as a counter to distinguish multiple inplace updates.  It
+ * starts at MaxBlockNumber, counts down to 0, and wraps back to
+ * MaxBlockNumber after 4B inplace updates.  (Counting upward or other ways
+ * would be fine, but this choice maximizes these special TIDs looking
+ * different from regular TIDs.)
+ */
+#define InplaceCanaryOffsetNumber	(InvalidOffsetNumber - 1)
+
 /* ----------------
  *		support macros
  * ----------------
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..e7404e8
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,29 @@
+Parsed test spec with 2 sessions
+
+starting permutation: b1 g1 s2 a2 s2 c1 s2
+step b1: BEGIN ISOLATION LEVEL READ COMMITTED;
+step g1: GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step a2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+ERROR:  tuple concurrently updated
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step c1: COMMIT;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index b2be88e..fcc4ad4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -86,6 +86,7 @@ test: alter-table-3
 test: alter-table-4
 test: create-trigger
 test: sequence-ddl
+test: intra-grant-inplace
 test: async-notify
 test: vacuum-no-cleanup-lock
 test: timeouts
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..b5720c5
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,29 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, such as
+# relhasindex=true, need special code to cope.
+#
+# FIXME this is an isolation test on the assumption that I'll change
+# heap_inplace_update() to wait for the GRANT to commit or abort.  If I don't
+# do that, this could be a non-isolation test using dblink().
+
+setup
+{
+  CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+  DROP TABLE intra_grant_inplace;
+}
+
+session s1
+step b1  { BEGIN ISOLATION LEVEL READ COMMITTED; }
+step g1  { GRANT SELECT ON intra_grant_inplace TO PUBLIC; }
+step c1  { COMMIT; }
+
+session s2
+step s2  { SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass; }
+step a2  { ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+
+permutation b1 g1 s2 a2 s2 c1 s2

intra-grant-inplace-via-tuplock-v0.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de815..8e51557 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3730,6 +3730,15 @@ l2:
 										   id_has_external,
 										   &old_key_copied);
 
+	/*
+	 * FIXME fail if updating pg_class without holding either (a)
+	 * LOCKTAG_RELATION on this row's rel, in a mode conflicting w/ ShareLock,
+	 * or (b) LOCKTAG_TUPLE.  Fail if updating pg_database without holding
+	 * either (a) LOCKTAG_OBJECT on this row's database OID, in FIXME mode, or
+	 * (b) LOCKTAG_TUPLE.  This may imply the executor taking the tuple locks
+	 * during SQL UPDATE of those tables.
+	 */
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
@@ -5892,6 +5901,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	HeapTupleHeader htup;
 	uint32		oldlen;
 	uint32		newlen;
+	HeapTupleData oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -5904,6 +5914,8 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+	/* FIXME fail unless holding LockTuple(relation, &tuple->t-self, AccessExclusiveLock */
+
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
@@ -5922,6 +5934,16 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
 
+	/* evaluate whether updated */
+	/* FIXME wait for a concurrent updater, in case it aborts */
+	oldtup.t_tableOid = RelationGetRelid(relation);
+	oldtup.t_data = htup;
+	oldtup.t_len = ItemIdGetLength(lp);
+	oldtup.t_self = tuple->t_self;
+	if (TM_Ok !=
+		HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false), buffer))
+		elog(ERROR, "tuple concurrently updated");
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 3ce6c09..dba3137 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1853,7 +1853,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(relation, RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2190,7 +2190,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(relation, cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 143fae0..7c02380 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2860,11 +2860,19 @@ index_update_stats(Relation rel,
 		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
 		tuple = heap_copytuple(tuple);
 		table_endscan(pg_class_scan);
+		LockTuple(pg_class, &tuple->t_self, AccessExclusiveLock);
+		/* FIXME loop in case the pg_class tuple for pg_class moved */
 	}
 	else
 	{
+		HeapTuple cachetup;
+
 		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+		cachetup = SearchSysCacheLocked1(pg_class,
+										 RELOID, ObjectIdGetDatum(relid));
+		tuple = heap_copytuple(cachetup);
+		if (HeapTupleIsValid(cachetup))
+			ReleaseSysCache(cachetup);
 	}
 
 	if (!HeapTupleIsValid(tuple))
@@ -2933,6 +2941,7 @@ index_update_stats(Relation rel,
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
 
+	UnlockTuple(pg_class, &tuple->t_self, AccessExclusiveLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 8dbda00..76be8b5 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -76,6 +76,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
 #include "lib/qunique.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -827,6 +828,70 @@ SearchSysCache1(int cacheId,
 	return SearchCatCache1(SysCache[cacheId], key1);
 }
 
+/*
+ * Like SearchSysCache1, but a LOCKTAG_TUPLE heavyweight lock is held in
+ * AccessExclusiveMode on return.
+ *
+ * pg_class and pg_database are subject to both heap_inplace_update() and
+ * regular heap_update().  We must not lose the effects of any inplace update.
+ * That might arise by heap_inplace_update() mutating a tuple, the xmax of
+ * which then commits.  Alternatively, it could arise by heap_inplace_update()
+ * mutating a tuple before an ongoing transaction's heap_update(), after the
+ * ongoing transaction fetches the old tuple to modify.  We prevent both with
+ * locking as follows.  All heap_inplace_update(pg_class) callers acquire two
+ * locks.  First, they acquire LOCKTAG_RELATION in mode ShareLock,
+ * ShareUpdateExclusiveLock, or a mode with strictly more conflicts.  Second,
+ * they acquire LOCKTAG_TUPLE.  heap_update(pg_class) callers can either take
+ * any lock that conflicts with one of those two.  Acquire one's choice of
+ * lock before fetching the tuple to be modified.  FIXME write corresponding
+ * story for pg_database.
+ *
+ * This function bundles the tasks of retrieving a tuple and acquiring the
+ * LOCKTAG_TUPLE.  Since much DDL opts to acquire LOCKTAG_RELATION, the
+ * LOCKTAG_TUPLE doesn't block such updaters of the returned tuple.  Callers
+ * shall reject superseded tuples, such as by checking with
+ * HeapTupleSatisfiesUpdate() while holding an exclusive lock on the tuple's
+ * buffer.
+ *
+ * We opt to loop until the search finds the same tuple we've locked.  FIXME
+ * This might not be strictly necessary, but it likely avoids some spurious
+ * errors.  In the happy case, this takes two fetches, one to determine the
+ * tid to lock and another to confirm that the TID remains the latest tuple.
+ *
+ * FIXME consider dropping Relation arg and deducing applicable locktag fields
+ * from cacheId.
+ *
+ * FIXME this may belong in a different file, with available key counts other
+ * than 1, etc.
+ */
+HeapTuple
+SearchSysCacheLocked1(Relation rel,
+					  int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	bool retry = false;
+
+	for (;;)
+	{
+		HeapTuple	tuple;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (!HeapTupleIsValid(tuple) ||
+			(retry && ItemPointerEquals(&tid, &tuple->t_self)))
+			return tuple;
+
+		if (retry)
+			UnlockTuple(rel, &tid, AccessExclusiveLock);
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		LockTuple(rel, &tid, AccessExclusiveLock);
+		retry = true;
+	}
+}
+
+
 HeapTuple
 SearchSysCache2(int cacheId,
 				Datum key1, Datum key2)
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 5d47a65..aba5385 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -18,6 +18,7 @@
 
 #include "access/attnum.h"
 #include "access/htup.h"
+#include "utils/rel.h"
 /* we intentionally do not include utils/catcache.h here */
 
 /*
@@ -174,6 +175,10 @@ extern bool RelationInvalidatesSnapshotsOnly(Oid relid);
 extern bool RelationHasSysCache(Oid relid);
 extern bool RelationSupportsSysCache(Oid relid);
 
+extern HeapTuple SearchSysCacheLocked1(Relation rel,
+									   int cacheId,
+									   Datum key1);
+
 /*
  * The use of the macros below rather than direct calls to the corresponding
  * functions is encouraged, as it insulates the caller from changes in the
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..a1eed4e
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,23 @@
+Parsed test spec with 2 sessions
+
+starting permutation: b1 g1 s2 a2 c1 s2
+step b1: BEGIN ISOLATION LEVEL READ COMMITTED;
+step g1: GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step a2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step c1: COMMIT;
+step a2: <... completed>
+ERROR:  tuple concurrently updated
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index b2be88e..fcc4ad4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -86,6 +86,7 @@ test: alter-table-3
 test: alter-table-4
 test: create-trigger
 test: sequence-ddl
+test: intra-grant-inplace
 test: async-notify
 test: vacuum-no-cleanup-lock
 test: timeouts
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..85059ed
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,25 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+  CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+  DROP TABLE intra_grant_inplace;
+}
+
+session s1
+step b1  { BEGIN ISOLATION LEVEL READ COMMITTED; }
+step g1  { GRANT SELECT ON intra_grant_inplace TO PUBLIC; }
+step c1  { COMMIT; }
+
+session s2
+step s2  { SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass; }
+step a2  { ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+
+permutation b1 g1 s2 a2 c1 s2

#13

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#12)

10 attachment(s)

Re: race condition in pg_class

I'm attaching patches implementing the LockTuple() design. It turns out we
don't just lose inplace updates. We also overwrite unrelated tuples,
reproduced at inplace.spec. Good starting points are README.tuplock and the
heap_inplace_update_scan() header comment.

On Wed, Nov 01, 2023 at 08:09:15PM -0700, Noah Misch wrote:

On Fri, Oct 27, 2023 at 04:26:12PM -0700, Noah Misch wrote:

On Fri, Oct 27, 2023 at 06:40:55PM -0400, Tom Lane wrote:

We could perhaps make this work by using the existing tuple-lock
infrastructure, rather than inventing new locktags (a choice that
spills to a lot of places including clients that examine pg_locks).

I'd also like to find a solution that fixes the case of a conflicting
manual UPDATE (although certainly that's a stretch goal we may not be
able to reach).

I implemented that; search for ri_needLockTagTuple.

- GRANT passes to heapam the fixed-size portion of the pre-modification tuple.
heap_update() compares those bytes to the oldtup in shared buffers to see if
an inplace update happened. (HEAD could get the bytes from a new
heap_update() parameter, while back branches would need a different passing
approach.)

This could have been fine, but ...

- LockTuple(), as seen in its attached prototype. I like this least at the
moment, because it changes pg_locks content without having a clear advantage
over the previous option.

... I settled on the LockTuple() design for these reasons:

- Solves more conflicts by waiting, instead of by ERROR or by retry loops.
- Extensions wanting inplace updates don't have a big disadvantage over core
code inplace updates.
- One could use this to stop "tuple concurrently updated" for pg_class rows,
by using SearchSysCacheLocked1() for all pg_class DDL and making that
function wait for any existing xmax like inplace_xmax_lock() does. I don't
expect to write that, but it's a nice option to have.
- pg_locks shows the new lock acquisitions.

Separable, nontrivial things not fixed in the attached patch stack:

- Inplace update uses transactional CacheInvalidateHeapTuple(). ROLLBACK of
CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
still seen in inplace-inval.spec. CacheInvalidateRelmap() does this right.

- AtEOXact_Inval(true) is outside the RecordTransactionCommit() critical
section, but it is critical. We must not commit transactional DDL without
other backends receiving an inval. (When the inplace inval becomes
nontransactional, it will face the same threat.)

- Trouble is possible, I bet, if the system crashes between the inplace-update
memcpy() and XLogInsert(). See the new XXX comment below the memcpy().
Might solve this by inplace update setting DELAY_CHKPT, writing WAL, and
finally issuing memcpy() into the buffer.

- [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's
xmin does not stop pruning, an MVCC scan in that backend can find zero
tuples when one is live. This is like what all backends got in the days of
SnapshotNow catalog scans. See the pgbench test suite addition. (Perhaps
the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
setting that flag later and unsetting it earlier.)

If you find decisions in this thread's patches are tied to any of those such
that I should not separate those, let's discuss that. Topics in the patches
that I feel are most fruitful for debate:

- This makes inplace update block if the tuple has an updater. It's like one
GRANT blocking another, except an inplace updater won't get "ERROR: tuple
concurrently updated" like one of the GRANTs would. I had implemented
versions that avoided this blocking by mutating each tuple in the updated
tuple chain. That worked, but it had corner cases bad for maintainability,
listed in the inplace_xmax_lock() header comment. I'd rather accept the
blocking, so hackers can rule out those corner cases. A long-running GRANT
already hurts VACUUM progress more just by keeping an XID running.

- Pre-checks could make heap_inplace_update_cancel() calls rarer. Avoiding
one of those avoids an exclusive buffer lock, and it avoids waiting on
concurrent heap_update() if any. We'd pre-check the syscache tuple.
EventTriggerOnLogin() does it that way, because the code was already in that
form. I expect only vac_update_datfrozenxid() concludes !dirty enough to
matter. I didn't bother with the optimization, but it would be simple.

- If any citus extension user feels like porting its heap_inplace_update()
call to this, I'd value hearing about your experience.

- I paid more than my usual attention to test coverage, considering the patch
stack's intensity compared to most back-patch bug fixes.

I've kept all the above topics brief; feel free to ask for more details.

Thanks,
nm

Attachments:

inplace005-UNEXPECTEDPASS-tap-meson-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make TAP todo_start effects the same under Meson and prove_check.
    
    This could have caused spurious failures only on SPARC Linux, because
    today's only todo_start tests for that platform.  Back-patch to v16,
    where Meson support first appeared.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/tools/testwrap b/src/tools/testwrap
index d01e610..9a270be 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,12 +41,22 @@ env_dict = {**os.environ,
             'TESTDATADIR': os.path.join(testdir, 'data'),
             'TESTLOGDIR': os.path.join(testdir, 'log')}
 
-sp = subprocess.run(args.test_command, env=env_dict)
+sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
+# Meson categorizes a passing TODO test point as bad
+# (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
+# directive, so Meson computes the file result like Perl does.  This could
+# have the side effect of delaying stdout lines relative to stderr.  That
+# doesn't affect the log file, and the TAP protocol uses stdout only.
+for line in sp.stdout:
+    if line.startswith(b'ok '):
+        line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
+    sys.stdout.buffer.write(line)
+returncode = sp.wait()
 
-if sp.returncode == 0:
+if returncode == 0:
     print('# test succeeded')
     open(os.path.join(testdir, 'test.success'), 'x')
 else:
     print('# test failed')
     open(os.path.join(testdir, 'test.fail'), 'x')
-sys.exit(sp.returncode)
+sys.exit(returncode)

inplace010-tests-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Improve test coverage for changes to inplace-updated catalogs.
    
    This covers both regular and inplace changes, since bugs arise at their
    intersection.  Where marked, these witness extant bugs.  Back-patch to
    v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index d0a86a2..4d5f8d2 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -68,6 +68,34 @@ $node->pgbench(
 		  "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE pg_temp.e;"
 	});
 
+# Test inplace updates from VACUUM concurrent with heap_update from GRANT.
+# The PROC_IN_VACUUM environment can't finish MVCC table scans consistently,
+# so this fails rarely.  To reproduce consistently, add a sleep after
+# GetCatalogSnapshot(non-catalog-rel).
+Test::More->builder->todo_start('PROC_IN_VACUUM scan breakage');
+$node->safe_psql('postgres', 'CREATE TABLE ddl_target ()');
+$node->pgbench(
+	'--no-vacuum --client=5 --protocol=prepared --transactions=50',
+	0,
+	[qr{processed: 250/250}],
+	[qr{^$}],
+	'concurrent GRANT/VACUUM',
+	{
+		'001_pgbench_grant@9' => q(
+			DO $$
+			BEGIN
+				PERFORM pg_advisory_xact_lock(42);
+				FOR i IN 1 .. 10 LOOP
+					GRANT SELECT ON ddl_target TO PUBLIC;
+					REVOKE SELECT ON ddl_target FROM PUBLIC;
+				END LOOP;
+			END
+			$$;
+),
+		'001_pgbench_vacuum_ddl_target@1' => "VACUUM ddl_target;",
+	});
+Test::More->builder->todo_end;
+
 # Trigger various connection errors
 $node->pgbench(
 	'no-such-database',
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 0237271..032d420 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -1337,3 +1337,29 @@ a|b|c|   d
 2|2|2|1004
 (2 rows)
 
+
+starting permutation: sys1 sysupd2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysupd2: 
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+ <waiting ...>
+step c1: COMMIT;
+step sysupd2: <... completed>
+step c2: COMMIT;
+
+starting permutation: sys1 sysmerge2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysmerge2: 
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+ <waiting ...>
+step c1: COMMIT;
+step sysmerge2: <... completed>
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/inplace-inval.out b/src/test/isolation/expected/inplace-inval.out
new file mode 100644
index 0000000..67b34ad
--- /dev/null
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -0,0 +1,32 @@
+Parsed test spec with 3 sessions
+
+starting permutation: cachefill3 cir1 cic2 ddl3 read1
+step cachefill3: TABLE newly_indexed;
+c
+-
+(0 rows)
+
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: cir1 cic2 ddl3 read1
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+t          
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
new file mode 100644
index 0000000..432ece5
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -0,0 +1,28 @@
+Parsed test spec with 3 sessions
+
+starting permutation: snap3 b1 grant1 vac2 snap3 c1 cmp3
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step b1: BEGIN;
+step grant1: 
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+
+step vac2: VACUUM (FREEZE);
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step c1: COMMIT;
+step cmp3: 
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+
+?column?              
+----------------------
+datfrozenxid retreated
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..cc1e47a
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,225 @@
+Parsed test spec with 5 sessions
+
+starting permutation: b1 grant1 read2 addk2 c1 read2
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: keyshr5 addk2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+
+starting permutation: keyshr5 b3 sfnku3 addk2 r3
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+
+starting permutation: b2 sfnku2 addk2 c2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: keyshr5 b2 sfnku2 addk2 c2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b2 sfnku2 b1 grant1 addk2 c2 c1 read2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b1 grant1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+f          
+(1 row)
+
+s4: WARNING:  got: tuple concurrently updated
+step revoke4: <... completed>
+step r3: ROLLBACK;
+
+starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step drop1: 
+	DROP TABLE intra_grant_inplace;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+(0 rows)
+
+s4: WARNING:  got: tuple concurrently deleted
+step revoke4: <... completed>
+step r3: ROLLBACK;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0342eb3..6da98cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,9 @@ test: fk-snapshot
 test: subxid-overflow
 test: eval-plan-qual
 test: eval-plan-qual-trigger
+test: inplace-inval
+test: intra-grant-inplace
+test: intra-grant-inplace-db
 test: lock-update-delete
 test: lock-update-traversal
 test: inherit-temp
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index edd6d19..3a74406 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,6 +194,12 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
+# test system class updates
+
+step sys1	{
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+}
+
 
 session s2
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
@@ -282,6 +288,18 @@ step wnested2 {
     );
 }
 
+step sysupd2	{
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+}
+
+step sysmerge2	{
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+}
+
 step c2	{ COMMIT; }
 step r2	{ ROLLBACK; }
 
@@ -380,3 +398,6 @@ permutation simplepartupdate complexpartupdate c1 c2 read_part
 permutation simplepartupdate_route1to2 complexpartupdate_route_err1 c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_route c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_doesnt_route c1 c2 read_part
+
+permutation sys1 sysupd2 c1 c2
+permutation sys1 sysmerge2 c1 c2
diff --git a/src/test/isolation/specs/inplace-inval.spec b/src/test/isolation/specs/inplace-inval.spec
new file mode 100644
index 0000000..d8e1c98
--- /dev/null
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -0,0 +1,38 @@
+# If a heap_update() caller retrieves its oldtup from a cache, it's possible
+# for that cache entry to predate an inplace update, causing loss of that
+# inplace update.  This arises because the transaction may abort before
+# sending the inplace invalidation message to the shared queue.
+
+setup
+{
+	CREATE TABLE newly_indexed (c int);
+}
+
+teardown
+{
+	DROP TABLE newly_indexed;
+}
+
+session s1
+step cir1	{ BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK; }
+step read1	{
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+}
+
+session s2
+step cic2	{ CREATE INDEX i2 ON newly_indexed (c); }
+
+session s3
+step cachefill3	{ TABLE newly_indexed; }
+step ddl3		{ ALTER TABLE newly_indexed ADD extra int; }
+
+
+permutation
+	cachefill3	# populates the pg_class row in the catcache
+	cir1	# sets relhasindex=true; rollback discards cache inval
+	cic2	# sees relhasindex=true, skips changing it (so no inval)
+	ddl3	# cached row as the oldtup of an update, losing relhasindex
+	read1	# observe damage XXX is an extant bug
+
+# without cachefill3, no bug
+permutation cir1 cic2 ddl3 read1
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
new file mode 100644
index 0000000..bbecd5d
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -0,0 +1,46 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, namely
+# datfrozenxid, need special code to cope.
+
+setup
+{
+	CREATE ROLE regress_temp_grantee;
+}
+
+teardown
+{
+	REVOKE ALL ON DATABASE isolation_regression FROM regress_temp_grantee;
+	DROP ROLE regress_temp_grantee;
+}
+
+# heap_update(pg_database)
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step vac2	{ VACUUM (FREEZE); }
+
+# observe datfrozenxid
+session s3
+setup	{
+	CREATE TEMP TABLE frozen_witness (x xid);
+}
+step snap3	{
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+}
+step cmp3	{
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+}
+
+
+# XXX extant bug
+permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..3cd696b
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,153 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  Inplace updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+	CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+	DROP TABLE IF EXISTS intra_grant_inplace;
+}
+
+# heap_update()
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+}
+step drop1	{
+	DROP TABLE intra_grant_inplace;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step read2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+}
+step b2		{ BEGIN; }
+step addk2	{ ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+step sfnku2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step c2		{ COMMIT; }
+
+# rowmarks
+session s3
+step b3		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step sfnku3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step sfu3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+}
+step r3	{ ROLLBACK; }
+
+# Additional heap_update()
+session s4
+# swallow error message to keep any OID value out of expected output
+step revoke4	{
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+}
+
+# Additional rowmarks
+session s5
+setup	{ BEGIN; }
+step keyshr5	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+}
+teardown	{ ROLLBACK; }
+
+
+# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+
+permutation
+	b1
+	grant1
+	read2
+	addk2(c1)	# inplace waits
+	c1
+	read2
+
+# inplace thru KEY SHARE
+permutation
+	keyshr5
+	addk2
+
+# inplace wait NO KEY UPDATE w/ KEY SHARE
+permutation
+	keyshr5
+	b3
+	sfnku3
+	addk2(r3)
+	r3
+
+# same-xact rowmark
+permutation
+	b2
+	sfnku2
+	addk2
+	c2
+
+# same-xact rowmark in multixact
+permutation
+	keyshr5
+	b2
+	sfnku2
+	addk2
+	c2
+
+permutation
+	b3
+	sfu3
+	b1
+	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
+	read2
+	addk2(c1)	# block in LockTuple() behind grant1
+	r3			# unblock grant1; addk2 now awaits grant1 xmax
+	c1
+	read2
+
+permutation
+	b2
+	sfnku2
+	b1
+	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
+	addk2			# block in LockTuple() behind grant1 = deadlock
+	c2
+	c1
+	read2
+
+# SearchSysCacheLocked1() calling LockRelease()
+permutation
+	b1
+	grant1
+	b3
+	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1
+	r3			# revoke4 unlocks old tuple and finds new
+
+# SearchSysCacheLocked1() finding a tuple, then no tuple
+permutation
+	b1
+	drop1
+	b3
+	sfu3(c1)		# acquire LockTuple(), await drop1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1				# sfu3 locks none; revoke4 unlocks old and finds none
+	r3
diff --git a/src/test/regress/expected/database.out b/src/test/regress/expected/database.out
new file mode 100644
index 0000000..30c0865
--- /dev/null
+++ b/src/test/regress/expected/database.out
@@ -0,0 +1,14 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index eddc1f4..3d33259 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -2691,6 +2691,30 @@ drop cascades to table measurement_y2007m01
 DROP FUNCTION measurement_insert_trigger();
 -- prepare
 RESET SESSION AUTHORIZATION;
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+    oid    
+-----------
+ pg_depend
+(1 row)
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+ oid  
+------
+ 2608
+(1 row)
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 675c567..ddc155c 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -28,7 +28,7 @@ test: strings md5 numerology point lseg line box path polygon circle date time t
 # geometry depends on point, lseg, line, box, path, polygon, circle
 # horology depends on date, time, timetz, timestamp, timestamptz, interval
 # ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc
+test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database
 
 # ----------
 # Load huge amounts of data
diff --git a/src/test/regress/sql/database.sql b/src/test/regress/sql/database.sql
new file mode 100644
index 0000000..6c61f2e
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,16 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
index 3d5d854..92163ec 100644
--- a/src/test/regress/sql/merge.sql
+++ b/src/test/regress/sql/merge.sql
@@ -1713,6 +1713,23 @@ DROP FUNCTION measurement_insert_trigger();
 -- prepare
 
 RESET SESSION AUTHORIZATION;
+
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();

inplace040-waitfuncs-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Create waitfuncs.c for pg_isolation_test_session_is_blocked().
    
    The next commit makes the function inspect an additional non-lock
    contention source, so it no longer fits in lockfuncs.c.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 610ccf2..edb09d4 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -116,6 +116,7 @@ OBJS = \
 	varchar.o \
 	varlena.o \
 	version.o \
+	waitfuncs.o \
 	windowfuncs.o \
 	xid.o \
 	xid8funcs.o \
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 13009cc..e790f85 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -13,7 +13,6 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/predicate_internals.h"
@@ -602,84 +601,6 @@ pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
 
 
 /*
- * pg_isolation_test_session_is_blocked - support function for isolationtester
- *
- * Check if specified PID is blocked by any of the PIDs listed in the second
- * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
- *
- * This is an undocumented function intended for use by the isolation tester,
- * and may change in future releases as required for testing purposes.
- */
-Datum
-pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
-{
-	int			blocked_pid = PG_GETARG_INT32(0);
-	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
-	ArrayType  *blocking_pids_a;
-	int32	   *interesting_pids;
-	int32	   *blocking_pids;
-	int			num_interesting_pids;
-	int			num_blocking_pids;
-	int			dummy;
-	int			i,
-				j;
-
-	/* Validate the passed-in array */
-	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
-	if (array_contains_nulls(interesting_pids_a))
-		elog(ERROR, "array must not contain nulls");
-	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
-	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
-										  ARR_DIMS(interesting_pids_a));
-
-	/*
-	 * Get the PIDs of all sessions blocking the given session's attempt to
-	 * acquire heavyweight locks.
-	 */
-	blocking_pids_a =
-		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
-
-	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
-	Assert(!array_contains_nulls(blocking_pids_a));
-	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
-	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
-									   ARR_DIMS(blocking_pids_a));
-
-	/*
-	 * Check if any of these are in the list of interesting PIDs, that being
-	 * the sessions that the isolation tester is running.  We don't use
-	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
-	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
-	 * blocking_pids to be usually empty and otherwise a very small number in
-	 * isolation tester cases, so make that the outer loop of a naive search
-	 * for a match.
-	 */
-	for (i = 0; i < num_blocking_pids; i++)
-		for (j = 0; j < num_interesting_pids; j++)
-		{
-			if (blocking_pids[i] == interesting_pids[j])
-				PG_RETURN_BOOL(true);
-		}
-
-	/*
-	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
-	 * theory check the resulting array of blocker PIDs against the
-	 * interesting PIDs list, but since there is no danger of autovacuum
-	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
-	 * on allocating a buffer and searching for overlap; so it's presently
-	 * sufficient for the isolation tester's purposes to use a single element
-	 * buffer and check if the number of safe snapshot blockers is non-zero.
-	 */
-	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
-		PG_RETURN_BOOL(true);
-
-	PG_RETURN_BOOL(false);
-}
-
-
-/*
  * Functions for manipulating advisory locks
  *
  * We make use of the locktag fields as follows:
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 48dbcf5..8c6fc80 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -103,6 +103,7 @@ backend_sources += files(
   'varchar.c',
   'varlena.c',
   'version.c',
+  'waitfuncs.c',
   'windowfuncs.c',
   'xid.c',
   'xid8funcs.c',
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
new file mode 100644
index 0000000..d9c92c3
--- /dev/null
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -0,0 +1,96 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitfuncs.c
+ *		Functions for SQL access to syntheses of multiple contention types.
+ *
+ * Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/utils/adt/waitfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "storage/predicate_internals.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+
+
+/*
+ * pg_isolation_test_session_is_blocked - support function for isolationtester
+ *
+ * Check if specified PID is blocked by any of the PIDs listed in the second
+ * argument.  Currently, this looks for blocking caused by waiting for
+ * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
+ * not directly under the isolationtester's control, eg autovacuum.
+ *
+ * This is an undocumented function intended for use by the isolation tester,
+ * and may change in future releases as required for testing purposes.
+ */
+Datum
+pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	ArrayType  *blocking_pids_a;
+	int32	   *interesting_pids;
+	int32	   *blocking_pids;
+	int			num_interesting_pids;
+	int			num_blocking_pids;
+	int			dummy;
+	int			i,
+				j;
+
+	/* Validate the passed-in array */
+	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
+	if (array_contains_nulls(interesting_pids_a))
+		elog(ERROR, "array must not contain nulls");
+	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
+	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
+										  ARR_DIMS(interesting_pids_a));
+
+	/*
+	 * Get the PIDs of all sessions blocking the given session's attempt to
+	 * acquire heavyweight locks.
+	 */
+	blocking_pids_a =
+		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
+
+	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
+	Assert(!array_contains_nulls(blocking_pids_a));
+	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
+	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
+									   ARR_DIMS(blocking_pids_a));
+
+	/*
+	 * Check if any of these are in the list of interesting PIDs, that being
+	 * the sessions that the isolation tester is running.  We don't use
+	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
+	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
+	 * blocking_pids to be usually empty and otherwise a very small number in
+	 * isolation tester cases, so make that the outer loop of a naive search
+	 * for a match.
+	 */
+	for (i = 0; i < num_blocking_pids; i++)
+		for (j = 0; j < num_interesting_pids; j++)
+		{
+			if (blocking_pids[i] == interesting_pids[j])
+				PG_RETURN_BOOL(true);
+		}
+
+	/*
+	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
+	 * theory check the resulting array of blocker PIDs against the
+	 * interesting PIDs list, but since there is no danger of autovacuum
+	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
+	 * on allocating a buffer and searching for overlap; so it's presently
+	 * sufficient for the isolation tester's purposes to use a single element
+	 * buffer and check if the number of safe snapshot blockers is non-zero.
+	 */
+	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
+		PG_RETURN_BOOL(true);
+
+	PG_RETURN_BOOL(false);
+}

inplace050-tests-inj-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add an injection_points isolation test suite.
    
    Make the isolation harness recognize injection_points wait events as a
    type of blocked state.  To simplify that, change that wait event naming
    scheme to INJECTION_POINT(name).  Add an initial test for an extant
    inplace-update bug.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee..4eda445 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -63,6 +63,7 @@
 #include "storage/procarray.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
@@ -6077,6 +6078,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+	INJECTION_POINT("inplace-before-pin");
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
index d9c92c3..b524a8a 100644
--- a/src/backend/utils/adt/waitfuncs.c
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -14,8 +14,13 @@
 
 #include "catalog/pg_type.h"
 #include "storage/predicate_internals.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/wait_event.h"
+
+#define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
 
 
 /*
@@ -23,8 +28,9 @@
  *
  * Check if specified PID is blocked by any of the PIDs listed in the second
  * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
+ * injection points, heavyweight locks, or safe snapshots.  We ignore blockage
+ * caused by PIDs not directly under the isolationtester's control, eg
+ * autovacuum.
  *
  * This is an undocumented function intended for use by the isolation tester,
  * and may change in future releases as required for testing purposes.
@@ -34,6 +40,8 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 {
 	int			blocked_pid = PG_GETARG_INT32(0);
 	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	PGPROC	   *proc;
+	const char *wait_event;
 	ArrayType  *blocking_pids_a;
 	int32	   *interesting_pids;
 	int32	   *blocking_pids;
@@ -43,6 +51,17 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 	int			i,
 				j;
 
+	/* Check if blocked_pid is in injection_wait(). */
+	proc = BackendPidGetProc(blocked_pid);
+	if (proc == NULL)
+		PG_RETURN_BOOL(false);	/* session gone: definitely unblocked */
+	wait_event =
+		pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
+	if (wait_event && strncmp("INJECTION_POINT(",
+							  wait_event,
+							  strlen("INJECTION_POINT(")) == 0)
+		PG_RETURN_BOOL(true);
+
 	/* Validate the passed-in array */
 	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
 	if (array_contains_nulls(interesting_pids_a))
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 31bd787..2ffd2f7 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -9,6 +9,8 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
+ISOLATION = inplace
+
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
 
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
new file mode 100644
index 0000000..123f45a
--- /dev/null
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -0,0 +1,43 @@
+Parsed test spec with 3 sessions
+
+starting permutation: vac1 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+ERROR:  could not create unique index "pg_class_oid_index"
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 5c44625..4193061 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -200,6 +200,7 @@ injection_notice(const char *name, const void *private_data)
 void
 injection_wait(const char *name, const void *private_data)
 {
+	char		event_name[NAMEDATALEN];
 	uint32		old_wait_counts = 0;
 	int			index = -1;
 	uint32		injection_wait_event = 0;
@@ -212,11 +213,11 @@ injection_wait(const char *name, const void *private_data)
 		return;
 
 	/*
-	 * Use the injection point name for this custom wait event.  Note that
-	 * this custom wait event name is not released, but we don't care much for
-	 * testing as this should be short-lived.
+	 * Note that this custom wait event name is not released, but we don't
+	 * care much for testing as this should be short-lived.
 	 */
-	injection_wait_event = WaitEventExtensionNew(name);
+	snprintf(event_name, sizeof(event_name), "INJECTION_POINT(%s)", name);
+	injection_wait_event = WaitEventExtensionNew(event_name);
 
 	/*
 	 * Find a free slot to wait for, and register this injection point's name.
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 8e1b5b4..3c23c14 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -37,4 +37,9 @@ tests += {
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
   },
+  'isolation': {
+    'specs': [
+      'inplace',
+    ],
+  },
 }
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
new file mode 100644
index 0000000..e957713
--- /dev/null
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -0,0 +1,83 @@
+# Test race conditions involving:
+# - s1: VACUUM inplace-updating a pg_class row
+# - s2: GRANT/REVOKE making pg_class rows dead
+# - s3: "VACUUM pg_class" making dead rows LP_UNUSED; DDL reusing them
+
+# Need GRANT to make a non-HOT update.  Otherwise, "VACUUM pg_class" would
+# leave an LP_REDIRECT that persists.  To get non-HOT, make rels so the
+# pg_class row for vactest.orig50 is on a filled page (assuming BLCKSZ=8192).
+# Just to save on filesystem syscalls, use relkind=c for every other rel.
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA vactest;
+	CREATE FUNCTION vactest.mkrels(text, int, int) RETURNS void
+		LANGUAGE plpgsql SET search_path = vactest AS $$
+	DECLARE
+		tname text;
+	BEGIN
+		FOR i in $2 .. $3 LOOP
+			tname := $1 || i;
+			EXECUTE FORMAT('CREATE TYPE ' || tname || ' AS ()');
+			RAISE DEBUG '% at %', tname, ctid
+				FROM pg_class WHERE oid = tname::regclass;
+		END LOOP;
+	END
+	$$;
+}
+setup	{ VACUUM FULL pg_class;  -- reduce free space }
+setup
+{
+	SELECT vactest.mkrels('orig', 1, 49);
+	CREATE TABLE vactest.orig50 ();
+	SELECT vactest.mkrels('orig', 51, 100);
+}
+
+# XXX DROP causes an assertion failure; adopt DROP once fixed
+teardown
+{
+	--DROP SCHEMA vactest CASCADE;
+	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP EXTENSION injection_points;
+}
+
+# Wait during inplace update, in a VACUUM of vactest.orig50.
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('inplace-before-pin', 'wait');
+}
+step vac1	{ VACUUM vactest.orig50;  -- wait during inplace update }
+# One bug scenario leaves two live pg_class tuples for vactest.orig50 and zero
+# live tuples for one of the "intruder" rels.  REINDEX observes the duplicate.
+step read1	{
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+}
+
+
+# Transactional updates of the tuple vac1 is waiting to inplace-update.
+session s2
+step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
+
+
+# Non-blocking actions.
+session s3
+step vac3		{ VACUUM pg_class; }
+# Reuse the lp that vac1 is waiting to change.  I've observed reuse at the 1st
+# or 18th CREATE, so create excess.
+step mkrels3	{
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+}
+
+
+# XXX extant bug
+permutation
+	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	grant2			# T0 becomes eligible for pruning, T1 is successor
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	read1
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index a792610..18c800c 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -50,7 +50,8 @@ $psql_session->query_until(
 
 # Wait until the backend enters the timeout injection point. Will get an error
 # here if anything goes wrong.
-$node->wait_for_event('client backend', 'transaction-timeout');
+$node->wait_for_event('client backend',
+	'INJECTION_POINT(transaction-timeout)');
 
 my $log_offset = -s $node->logfile;
 
@@ -86,7 +87,7 @@ $psql_session->query_until(
 
 # Wait until the backend enters the timeout injection point.
 $node->wait_for_event('client backend',
-	'idle-in-transaction-session-timeout');
+	'INJECTION_POINT(idle-in-transaction-session-timeout)');
 
 $log_offset = -s $node->logfile;
 
@@ -116,7 +117,8 @@ $psql_session->query_until(
 ));
 
 # Wait until the backend enters the timeout injection point.
-$node->wait_for_event('client backend', 'idle-session-timeout');
+$node->wait_for_event('client backend',
+	'INJECTION_POINT(idle-session-timeout)');
 
 $log_offset = -s $node->logfile;
 
diff --git a/src/test/recovery/t/041_checkpoint_at_promote.pl b/src/test/recovery/t/041_checkpoint_at_promote.pl
index 7c30731..538facb 100644
--- a/src/test/recovery/t/041_checkpoint_at_promote.pl
+++ b/src/test/recovery/t/041_checkpoint_at_promote.pl
@@ -78,7 +78,8 @@ $node_primary->wait_for_replay_catchup($node_standby);
 
 # Wait until the checkpointer is in the middle of the restart point
 # processing.
-$node_standby->wait_for_event('checkpointer', 'create-restart-point');
+$node_standby->wait_for_event('checkpointer',
+	'INJECTION_POINT(create-restart-point)');
 
 # Check the logs that the restart point has started on standby.  This is
 # optional, but let's be sure.

inplace060-nodeModifyTable-comments-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Expand comments and add an assertion in nodeModifyTable.c.
    
    Most comments concern RELKIND_VIEW.  One addresses the ExecUpdate()
    "tupleid" parameter.  A later commit will rely on these facts, but they
    hold already.  Back-patch to v12 (all supported versions), the plan for
    that commit.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3..7bcc03c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1398,18 +1398,18 @@ ExecDeleteEpilogue(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  *		DELETE is like UPDATE, except that we delete the tuple and no
  *		index modifications are needed.
  *
- *		When deleting from a table, tupleid identifies the tuple to
- *		delete and oldtuple is NULL.  When deleting from a view,
- *		oldtuple is passed to the INSTEAD OF triggers and identifies
- *		what to delete, and tupleid is invalid.  When deleting from a
- *		foreign table, tupleid is invalid; the FDW has to figure out
- *		which row to delete using data from the planSlot.  oldtuple is
- *		passed to foreign table triggers; it is NULL when the foreign
- *		table has no relevant triggers.  We use tupleDeleted to indicate
- *		whether the tuple is actually deleted, callers can use it to
- *		decide whether to continue the operation.  When this DELETE is a
- *		part of an UPDATE of partition-key, then the slot returned by
- *		EvalPlanQual() is passed back using output parameter epqreturnslot.
+ *		When deleting from a table, tupleid identifies the tuple to delete and
+ *		oldtuple is NULL.  When deleting through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to delete, and
+ *		tupleid is invalid.  When deleting from a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to delete using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.  We use
+ *		tupleDeleted to indicate whether the tuple is actually deleted,
+ *		callers can use it to decide whether to continue the operation.  When
+ *		this DELETE is a part of an UPDATE of partition-key, then the slot
+ *		returned by EvalPlanQual() is passed back using output parameter
+ *		epqreturnslot.
  *
  *		Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
@@ -2238,21 +2238,22 @@ ExecCrossPartitionUpdateForeignKey(ModifyTableContext *context,
  *		is, we don't want to get stuck in an infinite loop
  *		which corrupts your database..
  *
- *		When updating a table, tupleid identifies the tuple to
- *		update and oldtuple is NULL.  When updating a view, oldtuple
- *		is passed to the INSTEAD OF triggers and identifies what to
- *		update, and tupleid is invalid.  When updating a foreign table,
- *		tupleid is invalid; the FDW has to figure out which row to
- *		update using data from the planSlot.  oldtuple is passed to
- *		foreign table triggers; it is NULL when the foreign table has
- *		no relevant triggers.
+ *		When updating a table, tupleid identifies the tuple to update and
+ *		oldtuple is NULL.  When updating through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to update, and
+ *		tupleid is invalid.  When updating a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to update using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.
  *
  *		slot contains the new tuple value to be stored.
  *		planSlot is the output of the ModifyTable's subplan; we use it
  *		to access values from other input tables (for RETURNING),
  *		row-ID junk columns, etc.
  *
- *		Returns RETURNING result if any, otherwise NULL.
+ *		Returns RETURNING result if any, otherwise NULL.  On exit, if tupleid
+ *		had identified the tuple to update, it will identify the tuple
+ *		actually updated after EvalPlanQual.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -2717,10 +2718,10 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 
 	/*-----
 	 * If we are dealing with a WHEN MATCHED case, tupleid or oldtuple is
-	 * valid, depending on whether the result relation is a table or a view.
-	 * We execute the first action for which the additional WHEN MATCHED AND
-	 * quals pass.  If an action without quals is found, that action is
-	 * executed.
+	 * valid, depending on whether the result relation is a table or a view
+	 * having an INSTEAD OF trigger.  We execute the first action for which
+	 * the additional WHEN MATCHED AND quals pass.  If an action without quals
+	 * is found, that action is executed.
 	 *
 	 * Similarly, in the WHEN NOT MATCHED BY SOURCE case, tupleid or oldtuple
 	 * is valid, and we look at the given WHEN NOT MATCHED BY SOURCE actions
@@ -2811,8 +2812,8 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  * Check and execute the first qualifying MATCHED or NOT MATCHED BY SOURCE
  * action, depending on whether the join quals are satisfied.  If the target
  * relation is a table, the current target tuple is identified by tupleid.
- * Otherwise, if the target relation is a view, oldtuple is the current target
- * tuple from the view.
+ * Otherwise, if the target relation is a view having an INSTEAD OF trigger,
+ * oldtuple is the current target tuple from the view.
  *
  * We start from the first WHEN MATCHED or WHEN NOT MATCHED BY SOURCE action
  * and check if the WHEN quals pass, if any. If the WHEN quals for the first
@@ -2878,8 +2879,11 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
 	if (oldtuple != NULL)
+	{
+		Assert(resultRelInfo->ri_TrigDesc);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
+	}
 	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
 											tupleid,
 											SnapshotAny,
@@ -3992,8 +3996,8 @@ ExecModifyTable(PlanState *pstate)
 			 * know enough here to set t_tableOid.  Quite separately from
 			 * this, the FDW may fetch its own junk attrs to identify the row.
 			 *
-			 * Other relevant relkinds, currently limited to views, always
-			 * have a wholerow attribute.
+			 * Other relevant relkinds, currently limited to views having
+			 * INSTEAD OF triggers, always have a wholerow attribute.
 			 */
 			else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
 			{

inplace070-rel-locks-missing-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add relation locks where existing standards would have them.
    
    The SetRelationHasSubclass() header comment says to hold this
    ShareUpdateExclusiveLock lock, and it's conventional for the sort of DDL
    done there and in IndexSetParentIndex().  SequenceChangePersistence()
    and heap_create() are deep surgery warranting AccessExclusiveLock.  This
    might cause more deadlocks.  I didn't run down exact user-visible bugs
    reachable via these gaps.  Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 922ba79..2652405 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -407,6 +407,8 @@ heap_create(const char *relname,
 	/* ensure that stats are dropped if transaction aborts */
 	pgstat_create_relation(rel);
 
+	LockRelationOid(relid, AccessExclusiveLock);
+
 	return rel;
 }
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5a8568c..5c48e57 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1058,6 +1058,7 @@ index_create(Relation heapRelation,
 	if (OidIsValid(parentIndexRelid))
 	{
 		StoreSingleInheritance(indexRelationId, parentIndexRelid, 1);
+		LockRelationOid(parentIndexRelid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentIndexRelid, true);
 	}
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d9016ef..a28ce72 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4475,7 +4475,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 
 	/* set relhassubclass if an index partition has been added to the parent */
 	if (OidIsValid(parentOid))
+	{
+		LockRelationOid(parentOid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentOid, true);
+	}
 
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
@@ -4527,6 +4530,7 @@ update_relispartition(Oid relationId, bool newval)
 	HeapTuple	tup;
 	Relation	classRel;
 
+	LockRelationOid(relationId, ShareUpdateExclusiveLock);
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
 	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 4610356..a60d6e7 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -545,6 +545,7 @@ SequenceChangePersistence(Oid relid, char newrelpersistence)
 	Buffer		buf;
 	HeapTupleData seqdatatuple;
 
+	LockRelationOid(relid, AccessExclusiveLock);
 	init_sequence(relid, &elm, &seqrel);
 
 	/* check the comment above nextval_internal()'s equivalent call. */

inplace080-catcache-detoast-inplace-stale-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Cope with inplace updates making catcache entries stale during detoasting.
    
    This extends ad98fb14226ae6456fbaed7990ee7591cbe5efd2 to invals of
    inplace updates.  PgDatabaseToastTable is the only one affected.
    Trouble would require something like the inplace-inval.spec test.
    Consider GRANT ... ON DATABASE fetching a stale row from cache and
    discarding a datfrozenxid update that vac_truncate_clog() has already
    relied upon.  Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3217008..c7cb9bc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -136,6 +136,27 @@ IsCatalogRelationOid(Oid relid)
 }
 
 /*
+ * IsInplaceUpdateRelation
+ *		True iff core code performs inplace updates on the relation.
+ */
+bool
+IsInplaceUpdateRelation(Relation relation)
+{
+	return IsInplaceUpdateOid(RelationGetRelid(relation));
+}
+
+/*
+ * IsInplaceUpdateRelation
+ *		Like the above, but takes an OID as argument.
+ */
+bool
+IsInplaceUpdateOid(Oid relid)
+{
+	return (relid == RelationRelationId ||
+			relid == DatabaseRelationId);
+}
+
+/*
  * IsToastRelation
  *		True iff relation is a TOAST support relation (or index).
  *
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 569f51c..a734ab6 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -2008,6 +2008,23 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
 
 
 /*
+ * equalTuple
+ *		Are these tuples memcmp()-equal?
+ */
+static bool
+equalTuple(HeapTuple a, HeapTuple b)
+{
+	uint32		alen;
+	uint32		blen;
+
+	alen = a->t_len;
+	blen = b->t_len;
+	return (alen == blen &&
+			memcmp((char *) a->t_data,
+				   (char *) b->t_data, blen) == 0);
+}
+
+/*
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
  *		supplied data into it.  The new entry initially has refcount 0.
@@ -2057,14 +2074,45 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
+			bool		need_cmp = IsInplaceUpdateOid(cache->cc_reloid);
+			HeapTuple	before = NULL;
+			bool		matches = true;
+
+			if (need_cmp)
+				before = heap_copytuple(ntp);
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
 
 			/*
 			 * The tuple could become stale while we are doing toast table
 			 * access (since AcceptInvalidationMessages can run then), so we
-			 * must recheck its visibility afterwards.
+			 * must recheck its visibility afterwards.  The recheck shall
+			 * distinguish two cases:
+			 *
+			 * - Inval is not yet queued or not yet processed.  This implies
+			 * the current cache searcher accepts staleness in this tuple. A
+			 * future searcher that can't accept staleness will take a lock
+			 * conflicting with one the inval issuer held, so we won't miss
+			 * the inval.
+			 *
+			 * - Inval message processed during toast_flatten_tuple().  We
+			 * didn't get the memo.  The inval message processing did do
+			 * InvalidateCatalogSnapshot(), so systable_recheck_tuple() will
+			 * return false where needed.
+			 *
+			 * AcceptInvalidationMessages could have processed an inval for an
+			 * inplace update.  While this equalTuple() follows the usual rule
+			 * of reading with a pin and no buffer lock, it warrants suspicion
+			 * since an inplace update could appear at any moment.  It's safe
+			 * because the inplace update sends an invalidation that can't
+			 * reorder before the inplace data change.  If the change reaches
+			 * this process just after we look, we've not missed its inval.
 			 */
-			if (!systable_recheck_tuple(scandesc, ntp))
+			if (need_cmp)
+			{
+				matches = equalTuple(before, ntp);
+				heap_freetuple(before);
+			}
+			if (!matches || !systable_recheck_tuple(scandesc, ntp))
 			{
 				heap_freetuple(dtp);
 				return NULL;
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 1fd326e..a8dd304 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -21,11 +21,13 @@
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 extern bool IsCatalogRelation(Relation relation);
+extern bool IsInplaceUpdateRelation(Relation relation);
 
 extern bool IsSystemClass(Oid relid, Form_pg_class reltuple);
 extern bool IsToastClass(Form_pg_class reltuple);
 
 extern bool IsCatalogRelationOid(Oid relid);
+extern bool IsInplaceUpdateOid(Oid relid);
 
 extern bool IsCatalogNamespace(Oid namespaceId);
 extern bool IsToastNamespace(Oid namespaceId);

inplace090-LOCKTAG_TUPLE-eoxact-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 5154353..7229f21 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2234,6 +2234,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions, and add LockOrStrongerHeldByMe() instead of adding a
    LockHeldByMe() parameter.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..2a5b114 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets an
+  index's pg_class row, that lock must be on the table.  Locking the index rel
+  is optional.  (This allows VACUUM to overwrite per-index pg_class while
+  holding a lock on the table alone.)  We could allow weaker locks, in which
+  case the next paragraph would simply call for stronger locks for its class
+  of commands.  heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE
+  in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it
+  overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4eda445..b116fe1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4069,6 +4073,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock on relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6038,34 +6081,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6078,21 +6132,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6104,6 +6207,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6124,23 +6240,188 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take lock that conflicts with DROP.  If it does happen
+ * somehow, we'll wait for it like we would an update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5c48e57..00b3e4f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2788,7 +2788,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2822,33 +2824,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2911,11 +2892,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 521ee74..64b9e9d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index fe3cda2..55f5cce 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -335,32 +335,7 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
 						 relation->rd_lockInfo.lockRelId.dbId,
 						 relation->rd_lockInfo.lockRelId.relId);
 
-	if (LockHeldByMe(&tag, lockmode))
-		return true;
-
-	if (orstronger)
-	{
-		LOCKMODE	slockmode;
-
-		for (slockmode = lockmode + 1;
-			 slockmode <= MaxLockMode;
-			 slockmode++)
-		{
-			if (LockHeldByMe(&tag, slockmode))
-			{
-#ifdef NOT_USED
-				/* Sometimes this might be useful for debugging purposes */
-				elog(WARNING, "lock mode %s substituted for %s on relation %s",
-					 GetLockmodeName(tag.locktag_lockmethodid, slockmode),
-					 GetLockmodeName(tag.locktag_lockmethodid, lockmode),
-					 RelationGetRelationName(relation));
-#endif
-				return true;
-			}
-		}
-	}
-
-	return false;
+	return LockHeldByMe(&tag, lockmode, orstronger);
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 7229f21..60b746d 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -578,11 +578,17 @@ DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)
 }
 
 /*
- * LockHeldByMe -- test whether lock 'locktag' is held with mode 'lockmode'
- *		by the current transaction
+ * LockHeldByMe -- test whether lock 'locktag' is held by the current
+ *		transaction
+ *
+ * Returns true if current transaction holds a lock on 'tag' of mode
+ * 'lockmode'.  If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
  */
 bool
-LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
+LockHeldByMe(const LOCKTAG *locktag,
+			 LOCKMODE lockmode, bool orstronger)
 {
 	LOCALLOCKTAG localtag;
 	LOCALLOCK  *locallock;
@@ -598,7 +604,23 @@ LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
 										  &localtag,
 										  HASH_FIND, NULL);
 
-	return (locallock && locallock->nLocks > 0);
+	if (locallock && locallock->nLocks > 0)
+		return true;
+
+	if (orstronger)
+	{
+		LOCKMODE	slockmode;
+
+		for (slockmode = lockmode + 1;
+			 slockmode <= MaxLockMode;
+			 slockmode++)
+		{
+			if (LockHeldByMe(locktag, slockmode, false))
+				return true;
+		}
+	}
+
+	return false;
 }
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a504..33e7134 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 0017d4b..cc1f6e7 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -567,7 +567,8 @@ extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
 extern void LockReleaseSession(LOCKMETHODID lockmethodid);
 extern void LockReleaseCurrentOwner(LOCALLOCK **locallocks, int nlocks);
 extern void LockReassignCurrentOwner(LOCALLOCK **locallocks, int nlocks);
-extern bool LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode);
+extern bool LockHeldByMe(const LOCKTAG *locktag,
+						 LOCKMODE lockmode, bool orstronger);
 #ifdef USE_ASSERT_CHECKING
 extern HTAB *GetLockMethodLocalHash(void);
 #endif
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 2a5b114..3e42c94 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b116fe1..d2d0c36 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3209,6 +3216,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4075,6 +4086,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock on relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6120,6 +6214,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6141,6 +6236,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6160,6 +6256,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6169,6 +6267,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6180,6 +6287,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6246,6 +6355,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6271,9 +6381,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6331,7 +6444,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 143876b..49d4d5e 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0,
@@ -2073,6 +2075,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2186,7 +2190,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2262,6 +2266,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, ownerId, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index c7cb9bc..716b751 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -146,8 +146,17 @@ IsInplaceUpdateRelation(Relation relation)
 }
 
 /*
- * IsInplaceUpdateRelation
+ * IsInplaceUpdateOid
  *		Like the above, but takes an OID as argument.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateOid(Oid relid)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a28ce72..4ce7374 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4529,15 +4529,18 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	LockRelationOid(relationId, ShareUpdateExclusiveLock);
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index de0d911..7db62cb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3695,6 +3695,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3703,9 +3704,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3713,7 +3715,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4210,6 +4213,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4232,7 +4236,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4259,7 +4264,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -15632,7 +15638,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15736,6 +15742,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -18049,7 +18056,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -18068,6 +18076,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -18080,7 +18090,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -18092,6 +18104,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 7bcc03c..fb486e7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2312,6 +2312,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2320,6 +2322,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2413,6 +2416,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2517,6 +2528,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2842,6 +2861,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2878,17 +2898,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2960,7 +2996,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2971,7 +3007,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2991,7 +3027,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3009,7 +3046,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3020,7 +3057,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3101,7 +3138,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3175,7 +3212,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3193,6 +3230,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3221,7 +3267,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3249,13 +3295,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3302,6 +3348,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3753,6 +3803,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4065,6 +4116,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4076,6 +4129,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4084,6 +4138,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4095,6 +4154,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 262c987..99dbb5b 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3727,6 +3727,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3769,11 +3770,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3893,9 +3895,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8bc421e..abd68e2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#14

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#13)

Re: race condition in pg_class

On Sun, May 12, 2024 at 04:29:23PM -0700, Noah Misch wrote:

I'm attaching patches implementing the LockTuple() design. It turns out we
don't just lose inplace updates. We also overwrite unrelated tuples,
reproduced at inplace.spec. Good starting points are README.tuplock and the
heap_inplace_update_scan() header comment.

About inplace050-tests-inj-v1.patch.

+	/* Check if blocked_pid is in injection_wait(). */
+	proc = BackendPidGetProc(blocked_pid);
+	if (proc == NULL)
+		PG_RETURN_BOOL(false);	/* session gone: definitely unblocked */
+	wait_event =
+		pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
+	if (wait_event && strncmp("INJECTION_POINT(",
+							  wait_event,
+							  strlen("INJECTION_POINT(")) == 0)
+		PG_RETURN_BOOL(true);

Hmm. I am not sure that this is the right interface for the job
because this is not only related to injection points but to the
monitoring of a one or more wait events when running a permutation
step. Perhaps this is something that should be linked to the spec
files with some property area listing the wait events we're expected
to wait on instead when running a step that we know will wait?
--
Michael

#15

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#14)

Re: race condition in pg_class

On Mon, May 13, 2024 at 04:59:59PM +0900, Michael Paquier wrote:

About inplace050-tests-inj-v1.patch.
+	/* Check if blocked_pid is in injection_wait(). */
+	proc = BackendPidGetProc(blocked_pid);
+	if (proc == NULL)
+		PG_RETURN_BOOL(false);	/* session gone: definitely unblocked */
+	wait_event =
+		pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
+	if (wait_event && strncmp("INJECTION_POINT(",
+							  wait_event,
+							  strlen("INJECTION_POINT(")) == 0)
+		PG_RETURN_BOOL(true);
Hmm. I am not sure that this is the right interface for the job
because this is not only related to injection points but to the
monitoring of a one or more wait events when running a permutation
step.

Could you say more about that? Permutation steps don't monitor wait events
today. This patch would be the first instance of that.

Perhaps this is something that should be linked to the spec
files with some property area listing the wait events we're expected
to wait on instead when running a step that we know will wait?

The spec syntax doesn't distinguish contention types at all. The isolation
tester's needs are limited to distinguishing:

(a) process is waiting on another test session
(b) process is waiting on automatic background activity (autovacuum, mainly)

Automatic background activity doesn't make a process enter or leave
injection_wait(), so all injection point wait events fall in (a). (The tester
ignores (b), since those clear up without intervention. Failing to ignore
them, as the tester did long ago, made output unstable.)

#16

robertmhaas@gmail.com

over 1 year ago

In reply to: Noah Misch (#13)

Re: race condition in pg_class

On Sun, May 12, 2024 at 7:29 PM Noah Misch <noah@leadboat.com> wrote:

- [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's
xmin does not stop pruning, an MVCC scan in that backend can find zero
tuples when one is live. This is like what all backends got in the days of
SnapshotNow catalog scans. See the pgbench test suite addition. (Perhaps
the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
setting that flag later and unsetting it earlier.)

Are you saying that this is a problem already, or that the patch
causes it to start happening? If it's the former, that's horrible. If
it's the latter, I'd say that is a fatal defect.

--
Robert Haas
EDB: http://www.enterprisedb.com

#17

noah@leadboat.com

over 1 year ago

In reply to: Robert Haas (#16)

Re: race condition in pg_class

On Mon, May 13, 2024 at 03:53:08PM -0400, Robert Haas wrote:

On Sun, May 12, 2024 at 7:29 PM Noah Misch <noah@leadboat.com> wrote:

- [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's
xmin does not stop pruning, an MVCC scan in that backend can find zero
tuples when one is live. This is like what all backends got in the days of
SnapshotNow catalog scans. See the pgbench test suite addition. (Perhaps
the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
setting that flag later and unsetting it earlier.)

Are you saying that this is a problem already, or that the patch
causes it to start happening? If it's the former, that's horrible.

The former.

#18

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#13)

Re: race condition in pg_class

On Sun, May 12, 2024 at 04:29:23PM -0700, Noah Misch wrote:

I'm attaching patches implementing the LockTuple() design.

Starting 2024-06-10, I plan to push the first seven of the ten patches:

inplace005-UNEXPECTEDPASS-tap-meson-v1.patch
inplace010-tests-v1.patch
inplace040-waitfuncs-v1.patch
inplace050-tests-inj-v1.patch
inplace060-nodeModifyTable-comments-v1.patch
Those five just deal in tests, test infrastructure, and comments.
inplace070-rel-locks-missing-v1.patch
Main risk is new DDL deadlocks.
inplace080-catcache-detoast-inplace-stale-v1.patch
If it fails to fix the bug it targets, I expect it's a no-op rather than
breaking things.

I'll leave the last three of the ten needing review. Those three are beyond
my skill to self-certify.

#19

robertmhaas@gmail.com

over 1 year ago

In reply to: Noah Misch (#18)

Re: race condition in pg_class

On Wed, Jun 5, 2024 at 2:17 PM Noah Misch <noah@leadboat.com> wrote:

Starting 2024-06-10, I plan to push the first seven of the ten patches:

inplace005-UNEXPECTEDPASS-tap-meson-v1.patch
inplace010-tests-v1.patch
inplace040-waitfuncs-v1.patch
inplace050-tests-inj-v1.patch
inplace060-nodeModifyTable-comments-v1.patch
Those five just deal in tests, test infrastructure, and comments.
inplace070-rel-locks-missing-v1.patch
Main risk is new DDL deadlocks.
inplace080-catcache-detoast-inplace-stale-v1.patch
If it fails to fix the bug it targets, I expect it's a no-op rather than
breaking things.

I'll leave the last three of the ten needing review. Those three are beyond
my skill to self-certify.

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category - which they are not - instead of being a new
wait_event_type. That would have avoided the ugly wait-event naming
pattern, inconsistent with everything else, introduced by
inplace050-tests-inj-v1.patch.

I think that the comments and commit messages in this patch set could,
in some places, use improvement. For instance,
inplace060-nodeModifyTable-comments-v1.patch reflows a bunch of
comments, which makes it hard to see what actually changed, and the
commit message doesn't tell you, either. A good bit of it seems to be
changing "a view" to "a view INSTEAD OF trigger" or "a view having an
INSTEAD OF trigger," but the reasoning behind that change is not
spelled out anywhere. The reader is left to guess what the other case
is and why the same principles don't apply to it. I don't doubt that
the new comments are more correct than the old ones, but I expect
future patch authors to have difficulty maintaining that state of
affairs.

Similarly, inplace070-rel-locks-missing-v1.patch adds no comments.
IMHO, the commit message also isn't very informative. It disclaims
knowledge of what bug it's fixing, while at the same time leaving the
reader to figure out for themselves how the behavior has changed.
Consequently, I expect writing the release notes for a release
including this patch to be difficult: "We added some locks that block
... something ... in some circumstances ... to prevent ... something."
It's not really the job of the release note author to fill in those
blanks, but rather of the patch author or committer. I don't want to
overburden the act of fixing bugs, but I just feel like more
explanation is needed here. When I see for example that we're adding a
lock acquisition to the end of heap_create(), I can't help but wonder
if it's really true that we don't take a lock on a just-created
relation today. I'm certainly under the impression that we lock
newly-created, uncommitted relations, and a quick test seems to
confirm that. I don't quite know whether that happens, but evidently
this call is guarding against something more subtle than a categorical
failure to lock a relation on creation so I think there should be a
comment explaining what that thing is.

It's also quite surprising that SetRelationHasSubclass() says "take X
lock before calling" and 2 of 4 callers just don't. I guess that's how
it is. But shouldn't we then have an assertion inside that function to
guard against future mistakes? If the reason why we failed to add this
initially is discernible from the commit messages that introduced the
bug, it would be nice to mention what it seems to have been; if not,
it would at least be nice to mention the offending commit(s). I'm also
a bit worried that this is going to cause deadlocks, but I suppose if
it does, that's still better than the status quo.

IsInplaceUpdateOid's header comment says IsInplaceUpdateRelation
instead of IsInplaceUpdateOid.

inplace080-catcache-detoast-inplace-stale-v1.patch seems like another
place where spelling out the rationale in more detail would be helpful
to future readers; for instance, the commit message says that
PgDatabaseToastTable is the only one affected, but it doesn't say why
the others are not, or why this one is. The lengthy comment in
CatalogCacheCreateEntry is also difficult to correlate with the code
which follows. I can't guess whether the two cases called out in the
comment always needed to be handled and were handled save only for
in-place updates, and thus the comment changes were simply taking the
opportunity to elaborate on the existing comments; or whether one of
those cases is preexisting and the other arises from the desire to
handle inplace updates. It could be helpful to mention relevant
identifiers from the code in the comment text e.g.
"systable_recheck_tuple detects ordinary updates by noting changes to
the tuple's visibility information, while the equalTuple() case
detects inplace updates."

IMHO, this patch set underscores the desirability of removing in-place
update altogether. That sounds difficult and not back-patchable, but I
can't classify what this patch set does as anything better than grotty
hacks to work around serious design deficiencies. That is not a vote
against these patches: I see no better way forward. Nonetheless, I
dislike the lack of better options.

I have done only cursory review of the last two patches and don't feel
I'm in a place to certify them, at least not now.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20

michael@paquier.xyz

over 1 year ago

In reply to: Robert Haas (#19)

Re: race condition in pg_class

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category - which they are not - instead of being a new
wait_event_type. That would have avoided the ugly wait-event naming
pattern, inconsistent with everything else, introduced by
inplace050-tests-inj-v1.patch.

Not sure to agree with that. The set of core backend APIs supporting
injection points have nothing to do with wait events. The library
attached to one or more injection points *may* decide to use a wait
event like what the wait/wakeup calls in modules/injection_points do,
but that's entirely optional. These rely on custom wait events,
plugged into the Extension category as the code run is itself in an
extension. I am not arguing against the point that it may be
interesting to plug in custom wait event categories, but the current
design of wait events makes that much harder than what core is
currently able to handle, and I am not sure that this brings much at
the end as long as the wait event strings can be customized.

I've voiced upthread concerns over the naming enforced by the patch
and the way it plugs the namings into the isolation functions, by the
way.
--
Michael

#21

robertmhaas@gmail.com

over 1 year ago

In reply to: Michael Paquier (#20)

Re: race condition in pg_class

On Thu, Jun 6, 2024 at 7:20 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category - which they are not - instead of being a new
wait_event_type. That would have avoided the ugly wait-event naming
pattern, inconsistent with everything else, introduced by
inplace050-tests-inj-v1.patch.

Not sure to agree with that. The set of core backend APIs supporting
injection points have nothing to do with wait events. The library
attached to one or more injection points *may* decide to use a wait
event like what the wait/wakeup calls in modules/injection_points do,
but that's entirely optional. These rely on custom wait events,
plugged into the Extension category as the code run is itself in an
extension. I am not arguing against the point that it may be
interesting to plug in custom wait event categories, but the current
design of wait events makes that much harder than what core is
currently able to handle, and I am not sure that this brings much at
the end as long as the wait event strings can be customized.

I've voiced upthread concerns over the naming enforced by the patch
and the way it plugs the namings into the isolation functions, by the
way.

I think the core code should provide an "Injection Point" wait event
type and let extensions add specific wait events there, just like you
did for "Extension". Then this ugly naming would go away. As I see it,
"Extension" is only supposed to be used as a catch-all when we have no
other information, but here we do. If we refuse to use the
wait_event_type field to categorize waits, then people are going to
have to find some other way to get that data into the system, as Noah
has done.

--
Robert Haas
EDB: http://www.enterprisedb.com

#22

noah@leadboat.com

over 1 year ago

In reply to: Robert Haas (#21)

Re: race condition in pg_class

On Fri, Jun 07, 2024 at 09:08:03AM -0400, Robert Haas wrote:

On Thu, Jun 6, 2024 at 7:20 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category - which they are not - instead of being a new
wait_event_type. That would have avoided the ugly wait-event naming
pattern, inconsistent with everything else, introduced by
inplace050-tests-inj-v1.patch.

Not sure to agree with that. The set of core backend APIs supporting
injection points have nothing to do with wait events. The library
attached to one or more injection points *may* decide to use a wait
event like what the wait/wakeup calls in modules/injection_points do,
but that's entirely optional. These rely on custom wait events,
plugged into the Extension category as the code run is itself in an
extension. I am not arguing against the point that it may be
interesting to plug in custom wait event categories, but the current
design of wait events makes that much harder than what core is
currently able to handle, and I am not sure that this brings much at
the end as long as the wait event strings can be customized.

I've voiced upthread concerns over the naming enforced by the patch
and the way it plugs the namings into the isolation functions, by the
way.

I think the core code should provide an "Injection Point" wait event
type and let extensions add specific wait events there, just like you
did for "Extension".

Michael, could you accept the core code offering that, or not? If so, I am
content to implement that. If not, for injection point wait events, I have
just one priority. The isolation tester already detects lmgr locks without
the test writer teaching it about each lock individually. I want it to have
that same capability for injection points. Do you think we can find something
everyone can accept, having that property? These wait events show up in tests
only, and I'm happy to make the cosmetics be anything compatible with that
detection ability.

#23

noah@leadboat.com

over 1 year ago

In reply to: Robert Haas (#19)

12 attachment(s)

Re: race condition in pg_class

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

On Wed, Jun 5, 2024 at 2:17 PM Noah Misch <noah@leadboat.com> wrote:

Starting 2024-06-10, I plan to push the first seven of the ten patches:

inplace005-UNEXPECTEDPASS-tap-meson-v1.patch
inplace010-tests-v1.patch
inplace040-waitfuncs-v1.patch
inplace050-tests-inj-v1.patch
inplace060-nodeModifyTable-comments-v1.patch
Those five just deal in tests, test infrastructure, and comments.
inplace070-rel-locks-missing-v1.patch
Main risk is new DDL deadlocks.
inplace080-catcache-detoast-inplace-stale-v1.patch
If it fails to fix the bug it targets, I expect it's a no-op rather than
breaking things.

I'll leave the last three of the ten needing review. Those three are beyond
my skill to self-certify.

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category

I've replied on that branch of the thread.

I think that the comments and commit messages in this patch set could,
in some places, use improvement. For instance,
inplace060-nodeModifyTable-comments-v1.patch reflows a bunch of
comments, which makes it hard to see what actually changed, and the
commit message doesn't tell you, either. A good bit of it seems to be
changing "a view" to "a view INSTEAD OF trigger" or "a view having an
INSTEAD OF trigger," but the reasoning behind that change is not
spelled out anywhere. The reader is left to guess what the other case
is and why the same principles don't apply to it. I don't doubt that
the new comments are more correct than the old ones, but I expect
future patch authors to have difficulty maintaining that state of
affairs.

The two kinds are trigger-updatable views and auto-updatable views. I've
added sentences about that to the nodeModifyTable.c header comment. One could
argue for dropping the INSTEAD OF comment changes outside of the header.

Similarly, inplace070-rel-locks-missing-v1.patch adds no comments.
IMHO, the commit message also isn't very informative. It disclaims
knowledge of what bug it's fixing, while at the same time leaving the
reader to figure out for themselves how the behavior has changed.
Consequently, I expect writing the release notes for a release
including this patch to be difficult: "We added some locks that block
... something ... in some circumstances ... to prevent ... something."
It's not really the job of the release note author to fill in those
blanks, but rather of the patch author or committer. I don't want to

I had been thinking release notes should just say "Add missing DDL lock
acquisitions". One can cure a breach of our locking standards without proving
some specific bad outcome. However, one could counter that commands like
GRANT follow a different standard, and perhaps SetRelationHasSubclass() should
use the GRANT standard. Hence, I researched the bugs this fixes and split
inplace070-rel-locks-missing into three patches:

1. [inplace065-lock-SequenceChangePersistence] Lock in
SequenceChangePersistence(), where the omission can lose nextval()
increments of the sequence.

2. [inplace071-lock-SetRelationHasSubclass] Lock in SetRelationHasSubclass().
This one has only minor benefits; see the new commit message. A fair
alternative would be tuple-level locking in inplace120-locktag, like that
patch adds to GRANT. That might avoid some deadlocks. I feel like the
minor benefits justify the way I chose, but it's a weak preference.

3. [inplace075-lock-heap_create] Add to heap creation:

overburden the act of fixing bugs, but I just feel like more
explanation is needed here. When I see for example that we're adding a
lock acquisition to the end of heap_create(), I can't help but wonder
if it's really true that we don't take a lock on a just-created
relation today. I'm certainly under the impression that we lock
newly-created, uncommitted relations, and a quick test seems to
confirm that. I don't quite know whether that happens, but evidently
this call is guarding against something more subtle than a categorical
failure to lock a relation on creation so I think there should be a
comment explaining what that thing is.

I've covered that in the new log message. To lock as early as possible, I've
moved this up a layer, to just after relid assignment. One could argue this
change belongs in inplace120 rather than its own patch, since it's only here
to eliminate a harmless exception to the rule inplace120 asserts.

I've removed the update_relispartition() that appeared in
inplace070-rel-locks-missing-v1.patch. Only an older, unpublished draft of
the rules (that inplace110-successors adds to README.tuplock) required that
lock. The lock might be worthwhile for avoiding "tuple concurrently updated",
but it's out of scope for $SUBJECT.

It's also quite surprising that SetRelationHasSubclass() says "take X
lock before calling" and 2 of 4 callers just don't. I guess that's how
it is. But shouldn't we then have an assertion inside that function to
guard against future mistakes? If the reason why we failed to add this

Works for me. Done. I've moved the LockHeldByMe() change from
inplace110-successors to this patch, since the assertion wants it.

initially is discernible from the commit messages that introduced the
bug, it would be nice to mention what it seems to have been; if not,
it would at least be nice to mention the offending commit(s). I'm also

Done.

a bit worried that this is going to cause deadlocks, but I suppose if
it does, that's still better than the status quo.

IsInplaceUpdateOid's header comment says IsInplaceUpdateRelation
instead of IsInplaceUpdateOid.

Fixed.

inplace080-catcache-detoast-inplace-stale-v1.patch seems like another
place where spelling out the rationale in more detail would be helpful
to future readers; for instance, the commit message says that
PgDatabaseToastTable is the only one affected, but it doesn't say why
the others are not, or why this one is. The lengthy comment in

I've updated the commit message to answer that.

CatalogCacheCreateEntry is also difficult to correlate with the code
which follows. I can't guess whether the two cases called out in the
comment always needed to be handled and were handled save only for
in-place updates, and thus the comment changes were simply taking the
opportunity to elaborate on the existing comments; or whether one of
those cases is preexisting and the other arises from the desire to
handle inplace updates. It could be helpful to mention relevant
identifiers from the code in the comment text e.g.
"systable_recheck_tuple detects ordinary updates by noting changes to
the tuple's visibility information, while the equalTuple() case
detects inplace updates."

The patch was elaborating on existing comments. Reading the patch again
today, the elaboration no longer feels warranted. Hence, I've rewritten that
comment addition. I've included identifiers, and the patch no longer adds
comment material orthogonal to inplace updates.

Thanks,
nm

Attachments:

inplace005-UNEXPECTEDPASS-tap-meson-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make TAP todo_start effects the same under Meson and prove_check.
    
    This could have caused spurious failures only on SPARC Linux, because
    today's only todo_start tests for that platform.  Back-patch to v16,
    where Meson support first appeared.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/tools/testwrap b/src/tools/testwrap
index d01e610..9a270be 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,12 +41,22 @@ env_dict = {**os.environ,
             'TESTDATADIR': os.path.join(testdir, 'data'),
             'TESTLOGDIR': os.path.join(testdir, 'log')}
 
-sp = subprocess.run(args.test_command, env=env_dict)
+sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
+# Meson categorizes a passing TODO test point as bad
+# (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
+# directive, so Meson computes the file result like Perl does.  This could
+# have the side effect of delaying stdout lines relative to stderr.  That
+# doesn't affect the log file, and the TAP protocol uses stdout only.
+for line in sp.stdout:
+    if line.startswith(b'ok '):
+        line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
+    sys.stdout.buffer.write(line)
+returncode = sp.wait()
 
-if sp.returncode == 0:
+if returncode == 0:
     print('# test succeeded')
     open(os.path.join(testdir, 'test.success'), 'x')
 else:
     print('# test failed')
     open(os.path.join(testdir, 'test.fail'), 'x')
-sys.exit(sp.returncode)
+sys.exit(returncode)

inplace010-tests-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Improve test coverage for changes to inplace-updated catalogs.
    
    This covers both regular and inplace changes, since bugs arise at their
    intersection.  Where marked, these witness extant bugs.  Back-patch to
    v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index d0a86a2..4d5f8d2 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -68,6 +68,34 @@ $node->pgbench(
 		  "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE pg_temp.e;"
 	});
 
+# Test inplace updates from VACUUM concurrent with heap_update from GRANT.
+# The PROC_IN_VACUUM environment can't finish MVCC table scans consistently,
+# so this fails rarely.  To reproduce consistently, add a sleep after
+# GetCatalogSnapshot(non-catalog-rel).
+Test::More->builder->todo_start('PROC_IN_VACUUM scan breakage');
+$node->safe_psql('postgres', 'CREATE TABLE ddl_target ()');
+$node->pgbench(
+	'--no-vacuum --client=5 --protocol=prepared --transactions=50',
+	0,
+	[qr{processed: 250/250}],
+	[qr{^$}],
+	'concurrent GRANT/VACUUM',
+	{
+		'001_pgbench_grant@9' => q(
+			DO $$
+			BEGIN
+				PERFORM pg_advisory_xact_lock(42);
+				FOR i IN 1 .. 10 LOOP
+					GRANT SELECT ON ddl_target TO PUBLIC;
+					REVOKE SELECT ON ddl_target FROM PUBLIC;
+				END LOOP;
+			END
+			$$;
+),
+		'001_pgbench_vacuum_ddl_target@1' => "VACUUM ddl_target;",
+	});
+Test::More->builder->todo_end;
+
 # Trigger various connection errors
 $node->pgbench(
 	'no-such-database',
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 0237271..032d420 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -1337,3 +1337,29 @@ a|b|c|   d
 2|2|2|1004
 (2 rows)
 
+
+starting permutation: sys1 sysupd2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysupd2: 
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+ <waiting ...>
+step c1: COMMIT;
+step sysupd2: <... completed>
+step c2: COMMIT;
+
+starting permutation: sys1 sysmerge2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysmerge2: 
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+ <waiting ...>
+step c1: COMMIT;
+step sysmerge2: <... completed>
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/inplace-inval.out b/src/test/isolation/expected/inplace-inval.out
new file mode 100644
index 0000000..67b34ad
--- /dev/null
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -0,0 +1,32 @@
+Parsed test spec with 3 sessions
+
+starting permutation: cachefill3 cir1 cic2 ddl3 read1
+step cachefill3: TABLE newly_indexed;
+c
+-
+(0 rows)
+
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: cir1 cic2 ddl3 read1
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+t          
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
new file mode 100644
index 0000000..432ece5
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -0,0 +1,28 @@
+Parsed test spec with 3 sessions
+
+starting permutation: snap3 b1 grant1 vac2 snap3 c1 cmp3
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step b1: BEGIN;
+step grant1: 
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+
+step vac2: VACUUM (FREEZE);
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step c1: COMMIT;
+step cmp3: 
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+
+?column?              
+----------------------
+datfrozenxid retreated
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..cc1e47a
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,225 @@
+Parsed test spec with 5 sessions
+
+starting permutation: b1 grant1 read2 addk2 c1 read2
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: keyshr5 addk2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+
+starting permutation: keyshr5 b3 sfnku3 addk2 r3
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+
+starting permutation: b2 sfnku2 addk2 c2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: keyshr5 b2 sfnku2 addk2 c2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b2 sfnku2 b1 grant1 addk2 c2 c1 read2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b1 grant1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+f          
+(1 row)
+
+s4: WARNING:  got: tuple concurrently updated
+step revoke4: <... completed>
+step r3: ROLLBACK;
+
+starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step drop1: 
+	DROP TABLE intra_grant_inplace;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+(0 rows)
+
+s4: WARNING:  got: tuple concurrently deleted
+step revoke4: <... completed>
+step r3: ROLLBACK;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0342eb3..6da98cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,9 @@ test: fk-snapshot
 test: subxid-overflow
 test: eval-plan-qual
 test: eval-plan-qual-trigger
+test: inplace-inval
+test: intra-grant-inplace
+test: intra-grant-inplace-db
 test: lock-update-delete
 test: lock-update-traversal
 test: inherit-temp
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index edd6d19..3a74406 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,6 +194,12 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
+# test system class updates
+
+step sys1	{
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+}
+
 
 session s2
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
@@ -282,6 +288,18 @@ step wnested2 {
     );
 }
 
+step sysupd2	{
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+}
+
+step sysmerge2	{
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+}
+
 step c2	{ COMMIT; }
 step r2	{ ROLLBACK; }
 
@@ -380,3 +398,6 @@ permutation simplepartupdate complexpartupdate c1 c2 read_part
 permutation simplepartupdate_route1to2 complexpartupdate_route_err1 c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_route c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_doesnt_route c1 c2 read_part
+
+permutation sys1 sysupd2 c1 c2
+permutation sys1 sysmerge2 c1 c2
diff --git a/src/test/isolation/specs/inplace-inval.spec b/src/test/isolation/specs/inplace-inval.spec
new file mode 100644
index 0000000..d8e1c98
--- /dev/null
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -0,0 +1,38 @@
+# If a heap_update() caller retrieves its oldtup from a cache, it's possible
+# for that cache entry to predate an inplace update, causing loss of that
+# inplace update.  This arises because the transaction may abort before
+# sending the inplace invalidation message to the shared queue.
+
+setup
+{
+	CREATE TABLE newly_indexed (c int);
+}
+
+teardown
+{
+	DROP TABLE newly_indexed;
+}
+
+session s1
+step cir1	{ BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK; }
+step read1	{
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+}
+
+session s2
+step cic2	{ CREATE INDEX i2 ON newly_indexed (c); }
+
+session s3
+step cachefill3	{ TABLE newly_indexed; }
+step ddl3		{ ALTER TABLE newly_indexed ADD extra int; }
+
+
+permutation
+	cachefill3	# populates the pg_class row in the catcache
+	cir1	# sets relhasindex=true; rollback discards cache inval
+	cic2	# sees relhasindex=true, skips changing it (so no inval)
+	ddl3	# cached row as the oldtup of an update, losing relhasindex
+	read1	# observe damage XXX is an extant bug
+
+# without cachefill3, no bug
+permutation cir1 cic2 ddl3 read1
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
new file mode 100644
index 0000000..bbecd5d
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -0,0 +1,46 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, namely
+# datfrozenxid, need special code to cope.
+
+setup
+{
+	CREATE ROLE regress_temp_grantee;
+}
+
+teardown
+{
+	REVOKE ALL ON DATABASE isolation_regression FROM regress_temp_grantee;
+	DROP ROLE regress_temp_grantee;
+}
+
+# heap_update(pg_database)
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step vac2	{ VACUUM (FREEZE); }
+
+# observe datfrozenxid
+session s3
+setup	{
+	CREATE TEMP TABLE frozen_witness (x xid);
+}
+step snap3	{
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+}
+step cmp3	{
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+}
+
+
+# XXX extant bug
+permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..3cd696b
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,153 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  Inplace updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+	CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+	DROP TABLE IF EXISTS intra_grant_inplace;
+}
+
+# heap_update()
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+}
+step drop1	{
+	DROP TABLE intra_grant_inplace;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step read2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+}
+step b2		{ BEGIN; }
+step addk2	{ ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+step sfnku2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step c2		{ COMMIT; }
+
+# rowmarks
+session s3
+step b3		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step sfnku3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step sfu3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+}
+step r3	{ ROLLBACK; }
+
+# Additional heap_update()
+session s4
+# swallow error message to keep any OID value out of expected output
+step revoke4	{
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+}
+
+# Additional rowmarks
+session s5
+setup	{ BEGIN; }
+step keyshr5	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+}
+teardown	{ ROLLBACK; }
+
+
+# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+
+permutation
+	b1
+	grant1
+	read2
+	addk2(c1)	# inplace waits
+	c1
+	read2
+
+# inplace thru KEY SHARE
+permutation
+	keyshr5
+	addk2
+
+# inplace wait NO KEY UPDATE w/ KEY SHARE
+permutation
+	keyshr5
+	b3
+	sfnku3
+	addk2(r3)
+	r3
+
+# same-xact rowmark
+permutation
+	b2
+	sfnku2
+	addk2
+	c2
+
+# same-xact rowmark in multixact
+permutation
+	keyshr5
+	b2
+	sfnku2
+	addk2
+	c2
+
+permutation
+	b3
+	sfu3
+	b1
+	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
+	read2
+	addk2(c1)	# block in LockTuple() behind grant1
+	r3			# unblock grant1; addk2 now awaits grant1 xmax
+	c1
+	read2
+
+permutation
+	b2
+	sfnku2
+	b1
+	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
+	addk2			# block in LockTuple() behind grant1 = deadlock
+	c2
+	c1
+	read2
+
+# SearchSysCacheLocked1() calling LockRelease()
+permutation
+	b1
+	grant1
+	b3
+	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1
+	r3			# revoke4 unlocks old tuple and finds new
+
+# SearchSysCacheLocked1() finding a tuple, then no tuple
+permutation
+	b1
+	drop1
+	b3
+	sfu3(c1)		# acquire LockTuple(), await drop1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1				# sfu3 locks none; revoke4 unlocks old and finds none
+	r3
diff --git a/src/test/regress/expected/database.out b/src/test/regress/expected/database.out
new file mode 100644
index 0000000..30c0865
--- /dev/null
+++ b/src/test/regress/expected/database.out
@@ -0,0 +1,14 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index eddc1f4..3d33259 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -2691,6 +2691,30 @@ drop cascades to table measurement_y2007m01
 DROP FUNCTION measurement_insert_trigger();
 -- prepare
 RESET SESSION AUTHORIZATION;
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+    oid    
+-----------
+ pg_depend
+(1 row)
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+ oid  
+------
+ 2608
+(1 row)
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 675c567..ddc155c 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -28,7 +28,7 @@ test: strings md5 numerology point lseg line box path polygon circle date time t
 # geometry depends on point, lseg, line, box, path, polygon, circle
 # horology depends on date, time, timetz, timestamp, timestamptz, interval
 # ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc
+test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database
 
 # ----------
 # Load huge amounts of data
diff --git a/src/test/regress/sql/database.sql b/src/test/regress/sql/database.sql
new file mode 100644
index 0000000..6c61f2e
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,16 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
index 3d5d854..92163ec 100644
--- a/src/test/regress/sql/merge.sql
+++ b/src/test/regress/sql/merge.sql
@@ -1713,6 +1713,23 @@ DROP FUNCTION measurement_insert_trigger();
 -- prepare
 
 RESET SESSION AUTHORIZATION;
+
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();

inplace040-waitfuncs-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Create waitfuncs.c for pg_isolation_test_session_is_blocked().
    
    The next commit makes the function inspect an additional non-lock
    contention source, so it no longer fits in lockfuncs.c.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 610ccf2..edb09d4 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -116,6 +116,7 @@ OBJS = \
 	varchar.o \
 	varlena.o \
 	version.o \
+	waitfuncs.o \
 	windowfuncs.o \
 	xid.o \
 	xid8funcs.o \
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 13009cc..e790f85 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -13,7 +13,6 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/predicate_internals.h"
@@ -602,84 +601,6 @@ pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
 
 
 /*
- * pg_isolation_test_session_is_blocked - support function for isolationtester
- *
- * Check if specified PID is blocked by any of the PIDs listed in the second
- * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
- *
- * This is an undocumented function intended for use by the isolation tester,
- * and may change in future releases as required for testing purposes.
- */
-Datum
-pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
-{
-	int			blocked_pid = PG_GETARG_INT32(0);
-	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
-	ArrayType  *blocking_pids_a;
-	int32	   *interesting_pids;
-	int32	   *blocking_pids;
-	int			num_interesting_pids;
-	int			num_blocking_pids;
-	int			dummy;
-	int			i,
-				j;
-
-	/* Validate the passed-in array */
-	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
-	if (array_contains_nulls(interesting_pids_a))
-		elog(ERROR, "array must not contain nulls");
-	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
-	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
-										  ARR_DIMS(interesting_pids_a));
-
-	/*
-	 * Get the PIDs of all sessions blocking the given session's attempt to
-	 * acquire heavyweight locks.
-	 */
-	blocking_pids_a =
-		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
-
-	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
-	Assert(!array_contains_nulls(blocking_pids_a));
-	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
-	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
-									   ARR_DIMS(blocking_pids_a));
-
-	/*
-	 * Check if any of these are in the list of interesting PIDs, that being
-	 * the sessions that the isolation tester is running.  We don't use
-	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
-	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
-	 * blocking_pids to be usually empty and otherwise a very small number in
-	 * isolation tester cases, so make that the outer loop of a naive search
-	 * for a match.
-	 */
-	for (i = 0; i < num_blocking_pids; i++)
-		for (j = 0; j < num_interesting_pids; j++)
-		{
-			if (blocking_pids[i] == interesting_pids[j])
-				PG_RETURN_BOOL(true);
-		}
-
-	/*
-	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
-	 * theory check the resulting array of blocker PIDs against the
-	 * interesting PIDs list, but since there is no danger of autovacuum
-	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
-	 * on allocating a buffer and searching for overlap; so it's presently
-	 * sufficient for the isolation tester's purposes to use a single element
-	 * buffer and check if the number of safe snapshot blockers is non-zero.
-	 */
-	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
-		PG_RETURN_BOOL(true);
-
-	PG_RETURN_BOOL(false);
-}
-
-
-/*
  * Functions for manipulating advisory locks
  *
  * We make use of the locktag fields as follows:
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 48dbcf5..8c6fc80 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -103,6 +103,7 @@ backend_sources += files(
   'varchar.c',
   'varlena.c',
   'version.c',
+  'waitfuncs.c',
   'windowfuncs.c',
   'xid.c',
   'xid8funcs.c',
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
new file mode 100644
index 0000000..d9c92c3
--- /dev/null
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -0,0 +1,96 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitfuncs.c
+ *		Functions for SQL access to syntheses of multiple contention types.
+ *
+ * Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/utils/adt/waitfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "storage/predicate_internals.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+
+
+/*
+ * pg_isolation_test_session_is_blocked - support function for isolationtester
+ *
+ * Check if specified PID is blocked by any of the PIDs listed in the second
+ * argument.  Currently, this looks for blocking caused by waiting for
+ * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
+ * not directly under the isolationtester's control, eg autovacuum.
+ *
+ * This is an undocumented function intended for use by the isolation tester,
+ * and may change in future releases as required for testing purposes.
+ */
+Datum
+pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	ArrayType  *blocking_pids_a;
+	int32	   *interesting_pids;
+	int32	   *blocking_pids;
+	int			num_interesting_pids;
+	int			num_blocking_pids;
+	int			dummy;
+	int			i,
+				j;
+
+	/* Validate the passed-in array */
+	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
+	if (array_contains_nulls(interesting_pids_a))
+		elog(ERROR, "array must not contain nulls");
+	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
+	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
+										  ARR_DIMS(interesting_pids_a));
+
+	/*
+	 * Get the PIDs of all sessions blocking the given session's attempt to
+	 * acquire heavyweight locks.
+	 */
+	blocking_pids_a =
+		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
+
+	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
+	Assert(!array_contains_nulls(blocking_pids_a));
+	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
+	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
+									   ARR_DIMS(blocking_pids_a));
+
+	/*
+	 * Check if any of these are in the list of interesting PIDs, that being
+	 * the sessions that the isolation tester is running.  We don't use
+	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
+	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
+	 * blocking_pids to be usually empty and otherwise a very small number in
+	 * isolation tester cases, so make that the outer loop of a naive search
+	 * for a match.
+	 */
+	for (i = 0; i < num_blocking_pids; i++)
+		for (j = 0; j < num_interesting_pids; j++)
+		{
+			if (blocking_pids[i] == interesting_pids[j])
+				PG_RETURN_BOOL(true);
+		}
+
+	/*
+	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
+	 * theory check the resulting array of blocker PIDs against the
+	 * interesting PIDs list, but since there is no danger of autovacuum
+	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
+	 * on allocating a buffer and searching for overlap; so it's presently
+	 * sufficient for the isolation tester's purposes to use a single element
+	 * buffer and check if the number of safe snapshot blockers is non-zero.
+	 */
+	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
+		PG_RETURN_BOOL(true);
+
+	PG_RETURN_BOOL(false);
+}

inplace050-tests-inj-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add an injection_points isolation test suite.
    
    Make the isolation harness recognize injection_points wait events as a
    type of blocked state.  To simplify that, change that wait event naming
    scheme to INJECTION_POINT(name).  Add an initial test for an extant
    inplace-update bug.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee..4eda445 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -63,6 +63,7 @@
 #include "storage/procarray.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
@@ -6077,6 +6078,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+	INJECTION_POINT("inplace-before-pin");
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
index d9c92c3..b524a8a 100644
--- a/src/backend/utils/adt/waitfuncs.c
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -14,8 +14,13 @@
 
 #include "catalog/pg_type.h"
 #include "storage/predicate_internals.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/wait_event.h"
+
+#define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
 
 
 /*
@@ -23,8 +28,9 @@
  *
  * Check if specified PID is blocked by any of the PIDs listed in the second
  * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
+ * injection points, heavyweight locks, or safe snapshots.  We ignore blockage
+ * caused by PIDs not directly under the isolationtester's control, eg
+ * autovacuum.
  *
  * This is an undocumented function intended for use by the isolation tester,
  * and may change in future releases as required for testing purposes.
@@ -34,6 +40,8 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 {
 	int			blocked_pid = PG_GETARG_INT32(0);
 	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	PGPROC	   *proc;
+	const char *wait_event;
 	ArrayType  *blocking_pids_a;
 	int32	   *interesting_pids;
 	int32	   *blocking_pids;
@@ -43,6 +51,17 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 	int			i,
 				j;
 
+	/* Check if blocked_pid is in injection_wait(). */
+	proc = BackendPidGetProc(blocked_pid);
+	if (proc == NULL)
+		PG_RETURN_BOOL(false);	/* session gone: definitely unblocked */
+	wait_event =
+		pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
+	if (wait_event && strncmp("INJECTION_POINT(",
+							  wait_event,
+							  strlen("INJECTION_POINT(")) == 0)
+		PG_RETURN_BOOL(true);
+
 	/* Validate the passed-in array */
 	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
 	if (array_contains_nulls(interesting_pids_a))
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 31bd787..2ffd2f7 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -9,6 +9,8 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
+ISOLATION = inplace
+
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
 
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
new file mode 100644
index 0000000..123f45a
--- /dev/null
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -0,0 +1,43 @@
+Parsed test spec with 3 sessions
+
+starting permutation: vac1 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+ERROR:  could not create unique index "pg_class_oid_index"
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 5c44625..4193061 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -200,6 +200,7 @@ injection_notice(const char *name, const void *private_data)
 void
 injection_wait(const char *name, const void *private_data)
 {
+	char		event_name[NAMEDATALEN];
 	uint32		old_wait_counts = 0;
 	int			index = -1;
 	uint32		injection_wait_event = 0;
@@ -212,11 +213,11 @@ injection_wait(const char *name, const void *private_data)
 		return;
 
 	/*
-	 * Use the injection point name for this custom wait event.  Note that
-	 * this custom wait event name is not released, but we don't care much for
-	 * testing as this should be short-lived.
+	 * Note that this custom wait event name is not released, but we don't
+	 * care much for testing as this should be short-lived.
 	 */
-	injection_wait_event = WaitEventExtensionNew(name);
+	snprintf(event_name, sizeof(event_name), "INJECTION_POINT(%s)", name);
+	injection_wait_event = WaitEventExtensionNew(event_name);
 
 	/*
 	 * Find a free slot to wait for, and register this injection point's name.
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 8e1b5b4..3c23c14 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -37,4 +37,9 @@ tests += {
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
   },
+  'isolation': {
+    'specs': [
+      'inplace',
+    ],
+  },
 }
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
new file mode 100644
index 0000000..e957713
--- /dev/null
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -0,0 +1,83 @@
+# Test race conditions involving:
+# - s1: VACUUM inplace-updating a pg_class row
+# - s2: GRANT/REVOKE making pg_class rows dead
+# - s3: "VACUUM pg_class" making dead rows LP_UNUSED; DDL reusing them
+
+# Need GRANT to make a non-HOT update.  Otherwise, "VACUUM pg_class" would
+# leave an LP_REDIRECT that persists.  To get non-HOT, make rels so the
+# pg_class row for vactest.orig50 is on a filled page (assuming BLCKSZ=8192).
+# Just to save on filesystem syscalls, use relkind=c for every other rel.
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA vactest;
+	CREATE FUNCTION vactest.mkrels(text, int, int) RETURNS void
+		LANGUAGE plpgsql SET search_path = vactest AS $$
+	DECLARE
+		tname text;
+	BEGIN
+		FOR i in $2 .. $3 LOOP
+			tname := $1 || i;
+			EXECUTE FORMAT('CREATE TYPE ' || tname || ' AS ()');
+			RAISE DEBUG '% at %', tname, ctid
+				FROM pg_class WHERE oid = tname::regclass;
+		END LOOP;
+	END
+	$$;
+}
+setup	{ VACUUM FULL pg_class;  -- reduce free space }
+setup
+{
+	SELECT vactest.mkrels('orig', 1, 49);
+	CREATE TABLE vactest.orig50 ();
+	SELECT vactest.mkrels('orig', 51, 100);
+}
+
+# XXX DROP causes an assertion failure; adopt DROP once fixed
+teardown
+{
+	--DROP SCHEMA vactest CASCADE;
+	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP EXTENSION injection_points;
+}
+
+# Wait during inplace update, in a VACUUM of vactest.orig50.
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('inplace-before-pin', 'wait');
+}
+step vac1	{ VACUUM vactest.orig50;  -- wait during inplace update }
+# One bug scenario leaves two live pg_class tuples for vactest.orig50 and zero
+# live tuples for one of the "intruder" rels.  REINDEX observes the duplicate.
+step read1	{
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+}
+
+
+# Transactional updates of the tuple vac1 is waiting to inplace-update.
+session s2
+step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
+
+
+# Non-blocking actions.
+session s3
+step vac3		{ VACUUM pg_class; }
+# Reuse the lp that vac1 is waiting to change.  I've observed reuse at the 1st
+# or 18th CREATE, so create excess.
+step mkrels3	{
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+}
+
+
+# XXX extant bug
+permutation
+	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	grant2			# T0 becomes eligible for pruning, T1 is successor
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	read1
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index a792610..18c800c 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -50,7 +50,8 @@ $psql_session->query_until(
 
 # Wait until the backend enters the timeout injection point. Will get an error
 # here if anything goes wrong.
-$node->wait_for_event('client backend', 'transaction-timeout');
+$node->wait_for_event('client backend',
+	'INJECTION_POINT(transaction-timeout)');
 
 my $log_offset = -s $node->logfile;
 
@@ -86,7 +87,7 @@ $psql_session->query_until(
 
 # Wait until the backend enters the timeout injection point.
 $node->wait_for_event('client backend',
-	'idle-in-transaction-session-timeout');
+	'INJECTION_POINT(idle-in-transaction-session-timeout)');
 
 $log_offset = -s $node->logfile;
 
@@ -116,7 +117,8 @@ $psql_session->query_until(
 ));
 
 # Wait until the backend enters the timeout injection point.
-$node->wait_for_event('client backend', 'idle-session-timeout');
+$node->wait_for_event('client backend',
+	'INJECTION_POINT(idle-session-timeout)');
 
 $log_offset = -s $node->logfile;
 
diff --git a/src/test/recovery/t/041_checkpoint_at_promote.pl b/src/test/recovery/t/041_checkpoint_at_promote.pl
index 7c30731..538facb 100644
--- a/src/test/recovery/t/041_checkpoint_at_promote.pl
+++ b/src/test/recovery/t/041_checkpoint_at_promote.pl
@@ -78,7 +78,8 @@ $node_primary->wait_for_replay_catchup($node_standby);
 
 # Wait until the checkpointer is in the middle of the restart point
 # processing.
-$node_standby->wait_for_event('checkpointer', 'create-restart-point');
+$node_standby->wait_for_event('checkpointer',
+	'INJECTION_POINT(create-restart-point)');
 
 # Check the logs that the restart point has started on standby.  This is
 # optional, but let's be sure.

inplace060-nodeModifyTable-comments-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Expand comments and add an assertion in nodeModifyTable.c.
    
    Most comments concern RELKIND_VIEW.  One addresses the ExecUpdate()
    "tupleid" parameter.  A later commit will rely on these facts, but they
    hold already.  Back-patch to v12 (all supported versions), the plan for
    that commit.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3..a2442b7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -24,6 +24,14 @@
  *		values plus row-locating info for UPDATE and MERGE cases, or just the
  *		row-locating info for DELETE cases.
  *
+ *		The relation to modify can be an ordinary table, a view having an
+ *		INSTEAD OF trigger, or a foreign table.  Earlier processing already
+ *		pointed ModifyTable to the underlying relations of any automatically
+ *		updatable view not using an INSTEAD OF trigger, so code here can
+ *		assume it won't have one as a modification target.  This node does
+ *		process ri_WithCheckOptions, which may have expressions from those
+ *		automatically updatable views.
+ *
  *		MERGE runs a join between the source relation and the target table.
  *		If any WHEN NOT MATCHED [BY TARGET] clauses are present, then the join
  *		is an outer join that might output tuples without a matching target
@@ -1398,18 +1406,18 @@ ExecDeleteEpilogue(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  *		DELETE is like UPDATE, except that we delete the tuple and no
  *		index modifications are needed.
  *
- *		When deleting from a table, tupleid identifies the tuple to
- *		delete and oldtuple is NULL.  When deleting from a view,
- *		oldtuple is passed to the INSTEAD OF triggers and identifies
- *		what to delete, and tupleid is invalid.  When deleting from a
- *		foreign table, tupleid is invalid; the FDW has to figure out
- *		which row to delete using data from the planSlot.  oldtuple is
- *		passed to foreign table triggers; it is NULL when the foreign
- *		table has no relevant triggers.  We use tupleDeleted to indicate
- *		whether the tuple is actually deleted, callers can use it to
- *		decide whether to continue the operation.  When this DELETE is a
- *		part of an UPDATE of partition-key, then the slot returned by
- *		EvalPlanQual() is passed back using output parameter epqreturnslot.
+ *		When deleting from a table, tupleid identifies the tuple to delete and
+ *		oldtuple is NULL.  When deleting through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to delete, and
+ *		tupleid is invalid.  When deleting from a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to delete using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.  We use
+ *		tupleDeleted to indicate whether the tuple is actually deleted,
+ *		callers can use it to decide whether to continue the operation.  When
+ *		this DELETE is a part of an UPDATE of partition-key, then the slot
+ *		returned by EvalPlanQual() is passed back using output parameter
+ *		epqreturnslot.
  *
  *		Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
@@ -2238,21 +2246,22 @@ ExecCrossPartitionUpdateForeignKey(ModifyTableContext *context,
  *		is, we don't want to get stuck in an infinite loop
  *		which corrupts your database..
  *
- *		When updating a table, tupleid identifies the tuple to
- *		update and oldtuple is NULL.  When updating a view, oldtuple
- *		is passed to the INSTEAD OF triggers and identifies what to
- *		update, and tupleid is invalid.  When updating a foreign table,
- *		tupleid is invalid; the FDW has to figure out which row to
- *		update using data from the planSlot.  oldtuple is passed to
- *		foreign table triggers; it is NULL when the foreign table has
- *		no relevant triggers.
+ *		When updating a table, tupleid identifies the tuple to update and
+ *		oldtuple is NULL.  When updating through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to update, and
+ *		tupleid is invalid.  When updating a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to update using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.
  *
  *		slot contains the new tuple value to be stored.
  *		planSlot is the output of the ModifyTable's subplan; we use it
  *		to access values from other input tables (for RETURNING),
  *		row-ID junk columns, etc.
  *
- *		Returns RETURNING result if any, otherwise NULL.
+ *		Returns RETURNING result if any, otherwise NULL.  On exit, if tupleid
+ *		had identified the tuple to update, it will identify the tuple
+ *		actually updated after EvalPlanQual.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -2717,10 +2726,10 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 
 	/*-----
 	 * If we are dealing with a WHEN MATCHED case, tupleid or oldtuple is
-	 * valid, depending on whether the result relation is a table or a view.
-	 * We execute the first action for which the additional WHEN MATCHED AND
-	 * quals pass.  If an action without quals is found, that action is
-	 * executed.
+	 * valid, depending on whether the result relation is a table or a view
+	 * having an INSTEAD OF trigger.  We execute the first action for which
+	 * the additional WHEN MATCHED AND quals pass.  If an action without quals
+	 * is found, that action is executed.
 	 *
 	 * Similarly, in the WHEN NOT MATCHED BY SOURCE case, tupleid or oldtuple
 	 * is valid, and we look at the given WHEN NOT MATCHED BY SOURCE actions
@@ -2811,8 +2820,8 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  * Check and execute the first qualifying MATCHED or NOT MATCHED BY SOURCE
  * action, depending on whether the join quals are satisfied.  If the target
  * relation is a table, the current target tuple is identified by tupleid.
- * Otherwise, if the target relation is a view, oldtuple is the current target
- * tuple from the view.
+ * Otherwise, if the target relation is a view having an INSTEAD OF trigger,
+ * oldtuple is the current target tuple from the view.
  *
  * We start from the first WHEN MATCHED or WHEN NOT MATCHED BY SOURCE action
  * and check if the WHEN quals pass, if any. If the WHEN quals for the first
@@ -2878,8 +2887,11 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
 	if (oldtuple != NULL)
+	{
+		Assert(resultRelInfo->ri_TrigDesc);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
+	}
 	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
 											tupleid,
 											SnapshotAny,
@@ -3992,8 +4004,8 @@ ExecModifyTable(PlanState *pstate)
 			 * know enough here to set t_tableOid.  Quite separately from
 			 * this, the FDW may fetch its own junk attrs to identify the row.
 			 *
-			 * Other relevant relkinds, currently limited to views, always
-			 * have a wholerow attribute.
+			 * Other relevant relkinds, currently limited to views having
+			 * INSTEAD OF triggers, always have a wholerow attribute.
 			 */
 			else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
 			{

inplace065-lock-SequenceChangePersistence-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Lock owned sequences during ALTER TABLE SET { LOGGED | UNLOGGED }.
    
    These commands already make the persistence of owned sequences follow
    owned table persistence changes.  They didn't lock those sequences.
    They lost the effect of nextval() calls that other sessions make after
    the ALTER TABLE command, before the ALTER TABLE transaction commits.
    Fix by acquiring the same lock that ALTER SEQUENCE SET { LOGGED |
    UNLOGGED } acquires.  This might cause more deadlocks.  Back-patch to
    v15, where commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31 introduced
    unlogged sequences.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 4610356..d020867 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -545,6 +545,13 @@ SequenceChangePersistence(Oid relid, char newrelpersistence)
 	Buffer		buf;
 	HeapTupleData seqdatatuple;
 
+	/*
+	 * ALTER SEQUENCE acquires this lock earlier.  If we're processing an
+	 * owned sequence for ALTER TABLE, lock now.  Without the lock, we'd
+	 * discard increments from nextval() calls (in other sessions) between
+	 * this function's buffer unlock and this transaction's commit.
+	 */
+	LockRelationOid(relid, AccessExclusiveLock);
 	init_sequence(relid, &elm, &seqrel);
 
 	/* check the comment above nextval_internal()'s equivalent call. */

inplace071-lock-SetRelationHasSubclass-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX.
    
    Commit 5b562644fec696977df4a82790064e8287927891 added a comment that
    SetRelationHasSubclass() callers must hold this lock.  When commit
    17f206fbc824d2b4b14480199ca9ff7dea417eda extended use of this column to
    partitioned indexes, it didn't take the lock.  As the latter commit
    message mentioned, we currently never reset a partitioned index to
    relhassubclass=f.  That largely avoids harm from the lock omission.  The
    cause for fixing this now is to unblock introducing a rule about locks
    required to heap_update() a pg_class row.  This might cause more
    deadlocks.  It gives minor user-visible benefits:
    
    - If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE
      ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks
      instead of failing with "tuple concurrently updated".  (Many cases of
      DDL concurrency still fail that way.)
    
    - Match ALTER INDEX ATTACH PARTITION in choosing to lock the index.
    
    While not user-visible today, we'll need this if we ever make something
    set the flag to false for a partitioned index, like ANALYZE does today
    for tables.  Back-patch to v12 (all supported versions), the plan for
    the commit relying on the new rule.  In back branches, add
    LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5a8568c..5c48e57 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1058,6 +1058,7 @@ index_create(Relation heapRelation,
 	if (OidIsValid(parentIndexRelid))
 	{
 		StoreSingleInheritance(indexRelationId, parentIndexRelid, 1);
+		LockRelationOid(parentIndexRelid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentIndexRelid, true);
 	}
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d9016ef..716e0e8 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4475,7 +4475,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 
 	/* set relhassubclass if an index partition has been added to the parent */
 	if (OidIsValid(parentOid))
+	{
+		LockRelationOid(parentOid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentOid, true);
+	}
 
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index de0d911..7b66c57 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3581,8 +3581,15 @@ findAttrByName(const char *attributeName, const List *columns)
  * SetRelationHasSubclass
  *		Set the value of the relation's relhassubclass field in pg_class.
  *
- * NOTE: caller must be holding an appropriate lock on the relation.
- * ShareUpdateExclusiveLock is sufficient.
+ * It's always safe to set this field to true, because all SQL commands are
+ * ready to see true and then find no children.  On the other hand, commands
+ * generally assume zero children if this is false.
+ *
+ * Caller must hold any self-exclusive lock until end of transaction.  If the
+ * new value is false, caller must have acquired that lock before reading the
+ * evidence that justified the false value.  That way, it properly waits if
+ * another backend is simultaneously concluding no need to change the tuple
+ * (new and old values are true).
  *
  * NOTE: an important side-effect of this operation is that an SI invalidation
  * message is sent out to all backends --- including me --- causing plans
@@ -3597,6 +3604,11 @@ SetRelationHasSubclass(Oid relationId, bool relhassubclass)
 	HeapTuple	tuple;
 	Form_pg_class classtuple;
 
+	Assert(CheckRelationOidLockedByMe(relationId,
+									  ShareUpdateExclusiveLock, false) ||
+		   CheckRelationOidLockedByMe(relationId,
+									  ShareRowExclusiveLock, true));
+
 	/*
 	 * Fetch a modifiable copy of the tuple, modify it, update pg_class.
 	 */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index fe3cda2..094522a 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -335,32 +335,22 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
 						 relation->rd_lockInfo.lockRelId.dbId,
 						 relation->rd_lockInfo.lockRelId.relId);
 
-	if (LockHeldByMe(&tag, lockmode))
-		return true;
+	return LockHeldByMe(&tag, lockmode, orstronger);
+}
 
-	if (orstronger)
-	{
-		LOCKMODE	slockmode;
+/*
+ *		CheckRelationOidLockedByMe
+ *
+ * Like the above, but takes an OID as argument.
+ */
+bool
+CheckRelationOidLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+	LOCKTAG		tag;
 
-		for (slockmode = lockmode + 1;
-			 slockmode <= MaxLockMode;
-			 slockmode++)
-		{
-			if (LockHeldByMe(&tag, slockmode))
-			{
-#ifdef NOT_USED
-				/* Sometimes this might be useful for debugging purposes */
-				elog(WARNING, "lock mode %s substituted for %s on relation %s",
-					 GetLockmodeName(tag.locktag_lockmethodid, slockmode),
-					 GetLockmodeName(tag.locktag_lockmethodid, lockmode),
-					 RelationGetRelationName(relation));
-#endif
-				return true;
-			}
-		}
-	}
+	SetLocktagRelationOid(&tag, relid);
 
-	return false;
+	return LockHeldByMe(&tag, lockmode, orstronger);
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 5154353..1bf3e6d 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -578,11 +578,17 @@ DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)
 }
 
 /*
- * LockHeldByMe -- test whether lock 'locktag' is held with mode 'lockmode'
- *		by the current transaction
+ * LockHeldByMe -- test whether lock 'locktag' is held by the current
+ *		transaction
+ *
+ * Returns true if current transaction holds a lock on 'tag' of mode
+ * 'lockmode'.  If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
  */
 bool
-LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
+LockHeldByMe(const LOCKTAG *locktag,
+			 LOCKMODE lockmode, bool orstronger)
 {
 	LOCALLOCKTAG localtag;
 	LOCALLOCK  *locallock;
@@ -598,7 +604,23 @@ LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
 										  &localtag,
 										  HASH_FIND, NULL);
 
-	return (locallock && locallock->nLocks > 0);
+	if (locallock && locallock->nLocks > 0)
+		return true;
+
+	if (orstronger)
+	{
+		LOCKMODE	slockmode;
+
+		for (slockmode = lockmode + 1;
+			 slockmode <= MaxLockMode;
+			 slockmode++)
+		{
+			if (LockHeldByMe(locktag, slockmode, false))
+				return true;
+		}
+	}
+
+	return false;
 }
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 22b7856..ce15125 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,8 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
 extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
 extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
 									bool orstronger);
+extern bool CheckRelationOidLockedByMe(Oid relid, LOCKMODE lockmode,
+									   bool orstronger);
 extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
 
 extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 0017d4b..cc1f6e7 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -567,7 +567,8 @@ extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
 extern void LockReleaseSession(LOCKMETHODID lockmethodid);
 extern void LockReleaseCurrentOwner(LOCALLOCK **locallocks, int nlocks);
 extern void LockReassignCurrentOwner(LOCALLOCK **locallocks, int nlocks);
-extern bool LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode);
+extern bool LockHeldByMe(const LOCKTAG *locktag,
+						 LOCKMODE lockmode, bool orstronger);
 #ifdef USE_ASSERT_CHECKING
 extern HTAB *GetLockMethodLocalHash(void);
 #endif

inplace075-lock-heap_create-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    AccessExclusiveLock new relations just after assigning the OID.
    
    This has no user-visible, important consequences, since other sessions'
    catalog scans can't find the relation until we commit.  However, this
    unblocks introducing a rule about locks required to heap_update() a
    pg_class row.  CREATE TABLE has been acquiring this lock eventually, but
    it can heap_update() pg_class.relchecks earlier.  create_toast_table()
    has been acquiring only ShareLock.  Back-patch to v12 (all supported
    versions), the plan for the commit relying on the new rule.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 922ba79..fbad1d2 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1250,6 +1250,13 @@ heap_create_with_catalog(const char *relname,
 	}
 
 	/*
+	 * Other sessions' catalog scans can't find this until we commit.  Hence,
+	 * it doesn't hurt to hold AccessExclusiveLock.  Do it here so callers
+	 * can't accidentally vary in their lock mode or acquisition timing.
+	 */
+	LockRelationOid(relid, AccessExclusiveLock);
+
+	/*
 	 * Determine the relation's initial permissions.
 	 */
 	if (use_user_acl)

inplace080-catcache-detoast-inplace-stale-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Cope with inplace update making catcache stale during TOAST fetch.
    
    This extends ad98fb14226ae6456fbaed7990ee7591cbe5efd2 to invals of
    inplace updates.  Trouble requires an inplace update of a catalog having
    a TOAST table, so only pg_database was at risk.  (The other catalog on
    which core code performs inplace updates, pg_class, has no TOAST table.)
    Trouble would require something like the inplace-inval.spec test.
    Consider GRANT ... ON DATABASE fetching a stale row from cache and
    discarding a datfrozenxid update that vac_truncate_clog() has already
    relied upon.  Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3217008..6c39434 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -136,6 +136,27 @@ IsCatalogRelationOid(Oid relid)
 }
 
 /*
+ * IsInplaceUpdateRelation
+ *		True iff core code performs inplace updates on the relation.
+ */
+bool
+IsInplaceUpdateRelation(Relation relation)
+{
+	return IsInplaceUpdateOid(RelationGetRelid(relation));
+}
+
+/*
+ * IsInplaceUpdateOid
+ *		Like the above, but takes an OID as argument.
+ */
+bool
+IsInplaceUpdateOid(Oid relid)
+{
+	return (relid == RelationRelationId ||
+			relid == DatabaseRelationId);
+}
+
+/*
  * IsToastRelation
  *		True iff relation is a TOAST support relation (or index).
  *
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 569f51c..98aa527 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -2008,6 +2008,23 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
 
 
 /*
+ * equalTuple
+ *		Are these tuples memcmp()-equal?
+ */
+static bool
+equalTuple(HeapTuple a, HeapTuple b)
+{
+	uint32		alen;
+	uint32		blen;
+
+	alen = a->t_len;
+	blen = b->t_len;
+	return (alen == blen &&
+			memcmp((char *) a->t_data,
+				   (char *) b->t_data, blen) == 0);
+}
+
+/*
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
  *		supplied data into it.  The new entry initially has refcount 0.
@@ -2057,14 +2074,34 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
+			bool		need_cmp = IsInplaceUpdateOid(cache->cc_reloid);
+			HeapTuple	before = NULL;
+			bool		matches = true;
+
+			if (need_cmp)
+				before = heap_copytuple(ntp);
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
 
 			/*
 			 * The tuple could become stale while we are doing toast table
-			 * access (since AcceptInvalidationMessages can run then), so we
-			 * must recheck its visibility afterwards.
+			 * access (since AcceptInvalidationMessages can run then).
+			 * equalTuple() detects staleness from inplace updates, while
+			 * systable_recheck_tuple() detects staleness from normal updates.
+			 *
+			 * While this equalTuple() follows the usual rule of reading with
+			 * a pin and no buffer lock, it warrants suspicion since an
+			 * inplace update could appear at any moment.  It's safe because
+			 * the inplace update sends an invalidation that can't reorder
+			 * before the inplace heap change.  If the heap change reaches
+			 * this process just after equalTuple() looks, we've not missed
+			 * its inval.
 			 */
-			if (!systable_recheck_tuple(scandesc, ntp))
+			if (need_cmp)
+			{
+				matches = equalTuple(before, ntp);
+				heap_freetuple(before);
+			}
+			if (!matches || !systable_recheck_tuple(scandesc, ntp))
 			{
 				heap_freetuple(dtp);
 				return NULL;
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 1fd326e..a8dd304 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -21,11 +21,13 @@
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 extern bool IsCatalogRelation(Relation relation);
+extern bool IsInplaceUpdateRelation(Relation relation);
 
 extern bool IsSystemClass(Oid relid, Form_pg_class reltuple);
 extern bool IsToastClass(Form_pg_class reltuple);
 
 extern bool IsCatalogRelationOid(Oid relid);
+extern bool IsInplaceUpdateOid(Oid relid);
 
 extern bool IsCatalogNamespace(Oid namespaceId);
 extern bool IsToastNamespace(Oid namespaceId);

inplace090-LOCKTAG_TUPLE-eoxact-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 1bf3e6d..60b746d 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4eda445..f1d4fc0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4069,6 +4073,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6038,34 +6081,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6078,21 +6132,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6104,6 +6207,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6124,23 +6240,188 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take lock that conflicts with DROP.  If it does happen
+ * somehow, we'll wait for it like we would an update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5c48e57..00b3e4f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2788,7 +2788,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2822,33 +2824,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2911,11 +2892,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 521ee74..64b9e9d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a504..33e7134 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v2.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f1d4fc0..248af401 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3209,6 +3216,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4075,6 +4086,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6120,6 +6214,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6141,6 +6236,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6160,6 +6256,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6169,6 +6267,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6180,6 +6287,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6246,6 +6355,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6271,9 +6381,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6331,7 +6444,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 143876b..49d4d5e 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0,
@@ -2073,6 +2075,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2186,7 +2190,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2262,6 +2266,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, ownerId, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 716e0e8..56c0c93 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4529,14 +4529,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7b66c57..b79980a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3707,6 +3707,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3715,9 +3716,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3725,7 +3727,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4222,6 +4225,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4244,7 +4248,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4271,7 +4276,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -15644,7 +15650,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15748,6 +15754,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -18061,7 +18068,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -18080,6 +18088,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -18092,7 +18102,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -18104,6 +18116,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..b70d2f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2320,6 +2320,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2328,6 +2330,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2421,6 +2424,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2525,6 +2536,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2850,6 +2869,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2886,17 +2906,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2968,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2979,7 +3015,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2999,7 +3035,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3017,7 +3054,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3028,7 +3065,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3109,7 +3146,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3183,7 +3220,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3201,6 +3238,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3229,7 +3275,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3257,13 +3303,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3310,6 +3356,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3761,6 +3811,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4073,6 +4124,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4084,6 +4137,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4092,6 +4146,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4103,6 +4162,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 262c987..99dbb5b 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3727,6 +3727,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3769,11 +3770,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3893,9 +3895,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8bc421e..abd68e2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#24

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#22)

Re: race condition in pg_class

On Mon, Jun 10, 2024 at 07:19:27PM -0700, Noah Misch wrote:

On Fri, Jun 07, 2024 at 09:08:03AM -0400, Robert Haas wrote:

I think the core code should provide an "Injection Point" wait event
type and let extensions add specific wait events there, just like you
did for "Extension".

Michael, could you accept the core code offering that, or not? If so, I am
content to implement that. If not, for injection point wait events, I have
just one priority. The isolation tester already detects lmgr locks without
the test writer teaching it about each lock individually. I want it to have
that same capability for injection points. Do you think we can find something
everyone can accept, having that property? These wait events show up in tests
only, and I'm happy to make the cosmetics be anything compatible with that
detection ability.

Adding a wait event class for injection point is an interesting
suggestion that would simplify the detection in the isolation function
quite a bit. Are you sure that this is something that would be fit
for v17 material? TBH, I am not sure.

At the end, the test coverage has the highest priority and the bugs
you are addressing are complex enough that isolation tests of this
level are a necessity, so I don't object to what
inplace050-tests-inj-v2.patch introduces with the naming dependency
for the time being on HEAD. I'll just adapt and live with that
depending on what I deal with, while trying to improve HEAD later on.

I'm still wondering if there is something that could be more elegant
than a dedicated class for injection points, but I cannot think about
something that would be better for isolation tests on top of my head.
If there is something I can think of, I'll just go and implement it :)
--
Michael

#25

Michail Nikolaev

michail.nikolaev@gmail.com

over 1 year ago

In reply to: Michael Paquier (#24)

Re: race condition in pg_class

Hello, everyone.

I am not sure, but I think that issue may be related to the issue described
in
/messages/by-id/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com

It looks like REINDEX CONCURRENTLY could interfere with ON CONFLICT UPDATE
in some strange way.

Best regards,
Mikhail.

#26

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#24)

Re: race condition in pg_class

On Tue, Jun 11, 2024 at 01:37:21PM +0900, Michael Paquier wrote:

On Mon, Jun 10, 2024 at 07:19:27PM -0700, Noah Misch wrote:

On Fri, Jun 07, 2024 at 09:08:03AM -0400, Robert Haas wrote:

I think the core code should provide an "Injection Point" wait event
type and let extensions add specific wait events there, just like you
did for "Extension".

Michael, could you accept the core code offering that, or not? If so, I am
content to implement that. If not, for injection point wait events, I have
just one priority. The isolation tester already detects lmgr locks without
the test writer teaching it about each lock individually. I want it to have
that same capability for injection points. Do you think we can find something
everyone can accept, having that property? These wait events show up in tests
only, and I'm happy to make the cosmetics be anything compatible with that
detection ability.

Adding a wait event class for injection point is an interesting
suggestion that would simplify the detection in the isolation function
quite a bit. Are you sure that this is something that would be fit
for v17 material? TBH, I am not sure.

If I were making a list of changes always welcome post-beta, it wouldn't
include adding wait event types. But I don't hesitate to add one if it
unblocks a necessary test for a bug present in all versions.

At the end, the test coverage has the highest priority and the bugs
you are addressing are complex enough that isolation tests of this
level are a necessity, so I don't object to what
inplace050-tests-inj-v2.patch introduces with the naming dependency
for the time being on HEAD. I'll just adapt and live with that
depending on what I deal with, while trying to improve HEAD later on.

Here's what I'm reading for each person's willingness to tolerate each option:

Corrections and additional strategy lines welcome. Robert, how do you judge
the lines where I've listed you as "unknown"?

I'm still wondering if there is something that could be more elegant
than a dedicated class for injection points, but I cannot think about
something that would be better for isolation tests on top of my head.
If there is something I can think of, I'll just go and implement it :)

I once considered changing them to use advisory lock waits instead of
ConditionVariableSleep(), but I recall that was worse from the perspective of
injection points in critical sections.

#27

robertmhaas@gmail.com

over 1 year ago

In reply to: Noah Misch (#26)

Re: race condition in pg_class

On Wed, Jun 12, 2024 at 1:54 PM Noah Misch <noah@leadboat.com> wrote:

If I were making a list of changes always welcome post-beta, it wouldn't
include adding wait event types. But I don't hesitate to add one if it
unblocks a necessary test for a bug present in all versions.

However, injection points themselves are not present in all versions,
so even if we invent a new wait-event type, we'll have difficulty
testing older versions, unless we're planning to back-patch all of
that infrastructure, which I assume we aren't.

Personally, I think the fact that injection point wait events were put
under Extension is a design mistake that should be corrected before 17
is out of beta.

Here's what I'm reading for each person's willingness to tolerate each option:

STRATEGY | Paquier | Misch | Haas
--------------------------------------------------------
new "Injection Point" wait type | maybe | yes | yes
INJECTION_POINT(...) naming | yes | yes | unknown
isolation spec says event names | yes | no | unknown

Corrections and additional strategy lines welcome. Robert, how do you judge
the lines where I've listed you as "unknown"?

I'd tolerate INJECTION_POINT() if we had no other option but I think
it's clearly inferior. Does the last line refer to putting the
specific wait event names in the isolation spec file? If so, I'd also
be fine with that.

--
Robert Haas
EDB: http://www.enterprisedb.com

#28

noah@leadboat.com

over 1 year ago

In reply to: Robert Haas (#27)

Re: race condition in pg_class

On Wed, Jun 12, 2024 at 02:08:31PM -0400, Robert Haas wrote:

On Wed, Jun 12, 2024 at 1:54 PM Noah Misch <noah@leadboat.com> wrote:

If I were making a list of changes always welcome post-beta, it wouldn't
include adding wait event types. But I don't hesitate to add one if it
unblocks a necessary test for a bug present in all versions.

However, injection points themselves are not present in all versions,
so even if we invent a new wait-event type, we'll have difficulty
testing older versions, unless we're planning to back-patch all of
that infrastructure, which I assume we aren't.

Right. We could put the injection point tests in v18 only instead of v17+v18.
I feel that would be an overreaction to a dispute about names that show up
only in tests. Even so, I could accept that.

Personally, I think the fact that injection point wait events were put
under Extension is a design mistake that should be corrected before 17
is out of beta.

Works for me. I don't personally have a problem with the use of Extension,
since it is a src/test/modules extension creating them.

Here's what I'm reading for each person's willingness to tolerate each option:

STRATEGY | Paquier | Misch | Haas
--------------------------------------------------------
new "Injection Point" wait type | maybe | yes | yes
INJECTION_POINT(...) naming | yes | yes | unknown
isolation spec says event names | yes | no | unknown

Corrections and additional strategy lines welcome. Robert, how do you judge
the lines where I've listed you as "unknown"?

I'd tolerate INJECTION_POINT() if we had no other option but I think
it's clearly inferior. Does the last line refer to putting the
specific wait event names in the isolation spec file? If so, I'd also
be fine with that.

Yes, the last line does refer to that. Updated table:

STRATEGY | Paquier | Misch | Haas
--------------------------------------------------------
new "Injection Point" wait type | maybe | yes | yes
INJECTION_POINT(...) naming | yes | yes | no
isolation spec says event names | yes | no | yes

I find that's adequate support for the first line. If there are no objections
in the next 24hr, I will implement that.

#29

noah@leadboat.com

over 1 year ago

In reply to: Michail Nikolaev (#25)

Re: race condition in pg_class

On Wed, Jun 12, 2024 at 03:02:43PM +0200, Michail Nikolaev wrote:

I am not sure, but I think that issue may be related to the issue described
in
/messages/by-id/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com

It looks like REINDEX CONCURRENTLY could interfere with ON CONFLICT UPDATE
in some strange way.

Can you say more about the connection you see between $SUBJECT and that? That
looks like a valid report of an important bug, but I'm not following the
potential relationship to $SUBJECT.

On your other thread, it would be useful to see stack traces from the high-CPU
processes once the live lock has ended all query completion.

#30

Michail Nikolaev

michail.nikolaev@gmail.com

over 1 year ago

In reply to: Noah Misch (#29)

Re: race condition in pg_class

Hello!

Can you say more about the connection you see between $SUBJECT and that?

That

looks like a valid report of an important bug, but I'm not following the
potential relationship to $SUBJECT.

I was guided by the following logic:
* A pg_class race condition can cause table indexes to look stale.
* REINDEX updates indexes
* errors can be explained by different backends using different arbiter
indexes

On your other thread, it would be useful to see stack traces from the

high-CPU

processes once the live lock has ended all query completion.

I'll do.

Best regards,
Mikhail.

#31

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#28)

Re: race condition in pg_class

On Wed, Jun 12, 2024 at 12:32:23PM -0700, Noah Misch wrote:

On Wed, Jun 12, 2024 at 02:08:31PM -0400, Robert Haas wrote:

Personally, I think the fact that injection point wait events were put
under Extension is a design mistake that should be corrected before 17
is out of beta.

Well, isolation tests and the way to wait for specific points in them
is something I've thought about when working on the initial injpoint
infrastructure, but all my ideas went down to the fact that this is
not specific to injection points: I've also wanted to be able to cause
an isolation to wait for a specific event (class,name). A hardcoded
sleep is an example. Even if I discourage anything like that in the
in-core tests because they're slow on fast machines and can be
unreliable on slow machines, it is a fact that they are used by
out-of-core code and that extension developers find them acceptable.

Works for me. I don't personally have a problem with the use of Extension,
since it is a src/test/modules extension creating them.

That's the original reason why Extension has been used in this case,
because the points are assigned in an extension.

Yes, the last line does refer to that. Updated table:

STRATEGY | Paquier | Misch | Haas
--------------------------------------------------------
new "Injection Point" wait type | maybe | yes | yes
INJECTION_POINT(...) naming | yes | yes | no
isolation spec says event names | yes | no | yes

I find that's adequate support for the first line. If there are no objections
in the next 24hr, I will implement that.

OK. That sounds like a consensus to me, useful enough for the cases
at hand.
--
Michael

#32

noah@leadboat.com

over 1 year ago

In reply to: Michail Nikolaev (#30)

Re: race condition in pg_class

On Wed, Jun 12, 2024 at 10:02:00PM +0200, Michail Nikolaev wrote:

Can you say more about the connection you see between $SUBJECT and that?

That

looks like a valid report of an important bug, but I'm not following the
potential relationship to $SUBJECT.

I was guided by the following logic:
* A pg_class race condition can cause table indexes to look stale.
* REINDEX updates indexes
* errors can be explained by different backends using different arbiter
indexes

Got it. The race condition of $SUBJECT involves inplace updates, and the
wrong content becomes permanent. Hence, I suspect they're unrelated.

#33

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#23)

13 attachment(s)

Re: race condition in pg_class

On Mon, Jun 10, 2024 at 07:45:25PM -0700, Noah Misch wrote:

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category

I've replied on that branch of the thread.

I think the attached covers all comments to date. I gave everything v3, but
most patches have just a no-conflict rebase vs. v2. The exceptions are
inplace031-inj-wait-event (implements the holding from that branch of the
thread) and inplace050-tests-inj (updated to cooperate with inplace031). Much
of inplace031-inj-wait-event is essentially s/Extension/Custom/ for the
infrastructure common to the two custom wait event types.

Thanks,
nm

Attachments:

inplace005-UNEXPECTEDPASS-tap-meson-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make TAP todo_start effects the same under Meson and prove_check.
    
    This could have caused spurious failures only on SPARC Linux, because
    today's only todo_start tests for that platform.  Back-patch to v16,
    where Meson support first appeared.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/tools/testwrap b/src/tools/testwrap
index d01e610..9a270be 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,12 +41,22 @@ env_dict = {**os.environ,
             'TESTDATADIR': os.path.join(testdir, 'data'),
             'TESTLOGDIR': os.path.join(testdir, 'log')}
 
-sp = subprocess.run(args.test_command, env=env_dict)
+sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
+# Meson categorizes a passing TODO test point as bad
+# (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
+# directive, so Meson computes the file result like Perl does.  This could
+# have the side effect of delaying stdout lines relative to stderr.  That
+# doesn't affect the log file, and the TAP protocol uses stdout only.
+for line in sp.stdout:
+    if line.startswith(b'ok '):
+        line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
+    sys.stdout.buffer.write(line)
+returncode = sp.wait()
 
-if sp.returncode == 0:
+if returncode == 0:
     print('# test succeeded')
     open(os.path.join(testdir, 'test.success'), 'x')
 else:
     print('# test failed')
     open(os.path.join(testdir, 'test.fail'), 'x')
-sys.exit(sp.returncode)
+sys.exit(returncode)

inplace010-tests-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Improve test coverage for changes to inplace-updated catalogs.
    
    This covers both regular and inplace changes, since bugs arise at their
    intersection.  Where marked, these witness extant bugs.  Back-patch to
    v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 59ea538..956e290 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -68,6 +68,34 @@ $node->pgbench(
 		  "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE pg_temp.e;"
 	});
 
+# Test inplace updates from VACUUM concurrent with heap_update from GRANT.
+# The PROC_IN_VACUUM environment can't finish MVCC table scans consistently,
+# so this fails rarely.  To reproduce consistently, add a sleep after
+# GetCatalogSnapshot(non-catalog-rel).
+Test::More->builder->todo_start('PROC_IN_VACUUM scan breakage');
+$node->safe_psql('postgres', 'CREATE TABLE ddl_target ()');
+$node->pgbench(
+	'--no-vacuum --client=5 --protocol=prepared --transactions=50',
+	0,
+	[qr{processed: 250/250}],
+	[qr{^$}],
+	'concurrent GRANT/VACUUM',
+	{
+		'001_pgbench_grant@9' => q(
+			DO $$
+			BEGIN
+				PERFORM pg_advisory_xact_lock(42);
+				FOR i IN 1 .. 10 LOOP
+					GRANT SELECT ON ddl_target TO PUBLIC;
+					REVOKE SELECT ON ddl_target FROM PUBLIC;
+				END LOOP;
+			END
+			$$;
+),
+		'001_pgbench_vacuum_ddl_target@1' => "VACUUM ddl_target;",
+	});
+Test::More->builder->todo_end;
+
 # Trigger various connection errors
 $node->pgbench(
 	'no-such-database',
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 0237271..032d420 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -1337,3 +1337,29 @@ a|b|c|   d
 2|2|2|1004
 (2 rows)
 
+
+starting permutation: sys1 sysupd2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysupd2: 
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+ <waiting ...>
+step c1: COMMIT;
+step sysupd2: <... completed>
+step c2: COMMIT;
+
+starting permutation: sys1 sysmerge2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysmerge2: 
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+ <waiting ...>
+step c1: COMMIT;
+step sysmerge2: <... completed>
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/inplace-inval.out b/src/test/isolation/expected/inplace-inval.out
new file mode 100644
index 0000000..67b34ad
--- /dev/null
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -0,0 +1,32 @@
+Parsed test spec with 3 sessions
+
+starting permutation: cachefill3 cir1 cic2 ddl3 read1
+step cachefill3: TABLE newly_indexed;
+c
+-
+(0 rows)
+
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: cir1 cic2 ddl3 read1
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+t          
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
new file mode 100644
index 0000000..432ece5
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -0,0 +1,28 @@
+Parsed test spec with 3 sessions
+
+starting permutation: snap3 b1 grant1 vac2 snap3 c1 cmp3
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step b1: BEGIN;
+step grant1: 
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+
+step vac2: VACUUM (FREEZE);
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step c1: COMMIT;
+step cmp3: 
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+
+?column?              
+----------------------
+datfrozenxid retreated
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..cc1e47a
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,225 @@
+Parsed test spec with 5 sessions
+
+starting permutation: b1 grant1 read2 addk2 c1 read2
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: keyshr5 addk2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+
+starting permutation: keyshr5 b3 sfnku3 addk2 r3
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+
+starting permutation: b2 sfnku2 addk2 c2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: keyshr5 b2 sfnku2 addk2 c2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b2 sfnku2 b1 grant1 addk2 c2 c1 read2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b1 grant1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+f          
+(1 row)
+
+s4: WARNING:  got: tuple concurrently updated
+step revoke4: <... completed>
+step r3: ROLLBACK;
+
+starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step drop1: 
+	DROP TABLE intra_grant_inplace;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+(0 rows)
+
+s4: WARNING:  got: tuple concurrently deleted
+step revoke4: <... completed>
+step r3: ROLLBACK;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0342eb3..6da98cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,9 @@ test: fk-snapshot
 test: subxid-overflow
 test: eval-plan-qual
 test: eval-plan-qual-trigger
+test: inplace-inval
+test: intra-grant-inplace
+test: intra-grant-inplace-db
 test: lock-update-delete
 test: lock-update-traversal
 test: inherit-temp
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index edd6d19..3a74406 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,6 +194,12 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
+# test system class updates
+
+step sys1	{
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+}
+
 
 session s2
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
@@ -282,6 +288,18 @@ step wnested2 {
     );
 }
 
+step sysupd2	{
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+}
+
+step sysmerge2	{
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+}
+
 step c2	{ COMMIT; }
 step r2	{ ROLLBACK; }
 
@@ -380,3 +398,6 @@ permutation simplepartupdate complexpartupdate c1 c2 read_part
 permutation simplepartupdate_route1to2 complexpartupdate_route_err1 c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_route c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_doesnt_route c1 c2 read_part
+
+permutation sys1 sysupd2 c1 c2
+permutation sys1 sysmerge2 c1 c2
diff --git a/src/test/isolation/specs/inplace-inval.spec b/src/test/isolation/specs/inplace-inval.spec
new file mode 100644
index 0000000..d8e1c98
--- /dev/null
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -0,0 +1,38 @@
+# If a heap_update() caller retrieves its oldtup from a cache, it's possible
+# for that cache entry to predate an inplace update, causing loss of that
+# inplace update.  This arises because the transaction may abort before
+# sending the inplace invalidation message to the shared queue.
+
+setup
+{
+	CREATE TABLE newly_indexed (c int);
+}
+
+teardown
+{
+	DROP TABLE newly_indexed;
+}
+
+session s1
+step cir1	{ BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK; }
+step read1	{
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+}
+
+session s2
+step cic2	{ CREATE INDEX i2 ON newly_indexed (c); }
+
+session s3
+step cachefill3	{ TABLE newly_indexed; }
+step ddl3		{ ALTER TABLE newly_indexed ADD extra int; }
+
+
+permutation
+	cachefill3	# populates the pg_class row in the catcache
+	cir1	# sets relhasindex=true; rollback discards cache inval
+	cic2	# sees relhasindex=true, skips changing it (so no inval)
+	ddl3	# cached row as the oldtup of an update, losing relhasindex
+	read1	# observe damage XXX is an extant bug
+
+# without cachefill3, no bug
+permutation cir1 cic2 ddl3 read1
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
new file mode 100644
index 0000000..bbecd5d
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -0,0 +1,46 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, namely
+# datfrozenxid, need special code to cope.
+
+setup
+{
+	CREATE ROLE regress_temp_grantee;
+}
+
+teardown
+{
+	REVOKE ALL ON DATABASE isolation_regression FROM regress_temp_grantee;
+	DROP ROLE regress_temp_grantee;
+}
+
+# heap_update(pg_database)
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step vac2	{ VACUUM (FREEZE); }
+
+# observe datfrozenxid
+session s3
+setup	{
+	CREATE TEMP TABLE frozen_witness (x xid);
+}
+step snap3	{
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+}
+step cmp3	{
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+}
+
+
+# XXX extant bug
+permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..3cd696b
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,153 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  Inplace updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+	CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+	DROP TABLE IF EXISTS intra_grant_inplace;
+}
+
+# heap_update()
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+}
+step drop1	{
+	DROP TABLE intra_grant_inplace;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step read2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+}
+step b2		{ BEGIN; }
+step addk2	{ ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+step sfnku2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step c2		{ COMMIT; }
+
+# rowmarks
+session s3
+step b3		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step sfnku3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step sfu3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+}
+step r3	{ ROLLBACK; }
+
+# Additional heap_update()
+session s4
+# swallow error message to keep any OID value out of expected output
+step revoke4	{
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+}
+
+# Additional rowmarks
+session s5
+setup	{ BEGIN; }
+step keyshr5	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+}
+teardown	{ ROLLBACK; }
+
+
+# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+
+permutation
+	b1
+	grant1
+	read2
+	addk2(c1)	# inplace waits
+	c1
+	read2
+
+# inplace thru KEY SHARE
+permutation
+	keyshr5
+	addk2
+
+# inplace wait NO KEY UPDATE w/ KEY SHARE
+permutation
+	keyshr5
+	b3
+	sfnku3
+	addk2(r3)
+	r3
+
+# same-xact rowmark
+permutation
+	b2
+	sfnku2
+	addk2
+	c2
+
+# same-xact rowmark in multixact
+permutation
+	keyshr5
+	b2
+	sfnku2
+	addk2
+	c2
+
+permutation
+	b3
+	sfu3
+	b1
+	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
+	read2
+	addk2(c1)	# block in LockTuple() behind grant1
+	r3			# unblock grant1; addk2 now awaits grant1 xmax
+	c1
+	read2
+
+permutation
+	b2
+	sfnku2
+	b1
+	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
+	addk2			# block in LockTuple() behind grant1 = deadlock
+	c2
+	c1
+	read2
+
+# SearchSysCacheLocked1() calling LockRelease()
+permutation
+	b1
+	grant1
+	b3
+	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1
+	r3			# revoke4 unlocks old tuple and finds new
+
+# SearchSysCacheLocked1() finding a tuple, then no tuple
+permutation
+	b1
+	drop1
+	b3
+	sfu3(c1)		# acquire LockTuple(), await drop1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1				# sfu3 locks none; revoke4 unlocks old and finds none
+	r3
diff --git a/src/test/regress/expected/database.out b/src/test/regress/expected/database.out
new file mode 100644
index 0000000..30c0865
--- /dev/null
+++ b/src/test/regress/expected/database.out
@@ -0,0 +1,14 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index eddc1f4..3d33259 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -2691,6 +2691,30 @@ drop cascades to table measurement_y2007m01
 DROP FUNCTION measurement_insert_trigger();
 -- prepare
 RESET SESSION AUTHORIZATION;
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+    oid    
+-----------
+ pg_depend
+(1 row)
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+ oid  
+------
+ 2608
+(1 row)
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 969ced9..2429ec2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -28,7 +28,7 @@ test: strings md5 numerology point lseg line box path polygon circle date time t
 # geometry depends on point, lseg, line, box, path, polygon, circle
 # horology depends on date, time, timetz, timestamp, timestamptz, interval
 # ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc
+test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database
 
 # ----------
 # Load huge amounts of data
diff --git a/src/test/regress/sql/database.sql b/src/test/regress/sql/database.sql
new file mode 100644
index 0000000..6c61f2e
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,16 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
index 3d5d854..92163ec 100644
--- a/src/test/regress/sql/merge.sql
+++ b/src/test/regress/sql/merge.sql
@@ -1713,6 +1713,23 @@ DROP FUNCTION measurement_insert_trigger();
 -- prepare
 
 RESET SESSION AUTHORIZATION;
+
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();

inplace031-inj-wait-event-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add wait event type "InjectionPoint", a custom type like "Extension".
    
    Both injection points and customization of type "Extension" are new in
    v17, so this just changes a detail of an unreleased feature.
    
    Reported by Robert Haas.  Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 053da8d..8233f98 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1064,6 +1064,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
      <row>
+      <entry><literal>InjectionPoint</literal></entry>
+      <entry>The server process is waiting for an injection point to reach an
+       outcome defined in a test.  See
+       <xref linkend="xfunc-addin-injection-points"/> for more details.  This
+       type has no predefined wait points.
+      </entry>
+     </row>
+     <row>
       <entry><literal>IO</literal></entry>
       <entry>The server process is waiting for an I/O operation to complete.
        <literal>wait_event</literal> will identify the specific wait point;
@@ -1139,8 +1147,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
    <note>
     <para>
-     Extensions can add <literal>Extension</literal> and
-     <literal>LWLock</literal> events
+     Extensions can add <literal>Extension</literal>,
+     <literal>InjectionPoint</literal>. and <literal>LWLock</literal> events
      to the lists shown in <xref linkend="wait-event-extension-table"/> and
      <xref linkend="wait-event-lwlock-table"/>. In some cases, the name
      of an <literal>LWLock</literal> assigned by an extension will not be
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index a7c1704..66c1c30 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3643,7 +3643,11 @@ extern void InjectionPointAttach(const char *name,
 static void
 custom_injection_callback(const char *name, const void *private_data)
 {
+    uint32 wait_event_info = WaitEventInjectionPointNew(name);
+
+    pgstat_report_wait_start(wait_event_info);
     elog(NOTICE, "%s: executed custom callback", name);
+    pgstat_report_wait_end();
 }
 </programlisting>
      This callback prints a message to server error log with severity
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 521ed54..2100150 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -149,7 +149,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, SyncScanShmemSize());
 	size = add_size(size, AsyncShmemSize());
 	size = add_size(size, StatsShmemSize());
-	size = add_size(size, WaitEventExtensionShmemSize());
+	size = add_size(size, WaitEventCustomShmemSize());
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 #ifdef EXEC_BACKEND
@@ -355,7 +355,7 @@ CreateOrAttachShmemStructs(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	StatsShmemInit();
-	WaitEventExtensionShmemInit();
+	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
 }
 
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 42f36f4..6a9c0a5 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -181,9 +181,10 @@ if ($gen_code)
 	foreach my $waitclass (sort { uc($a) cmp uc($b) } keys %hashwe)
 	{
 		# Don't generate the pgstat_wait_event.c and wait_event_types.h files
-		# for Extension, LWLock and Lock, these are handled independently.
+		# for types handled independently.
 		next
 		  if ( $waitclass eq 'WaitEventExtension'
+			|| $waitclass eq 'WaitEventInjectionPoint'
 			|| $waitclass eq 'WaitEventLWLock'
 			|| $waitclass eq 'WaitEventLock');
 
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 084a9df..b2b1281 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -47,68 +47,69 @@ uint32	   *my_wait_event_info = &local_my_wait_event_info;
  * Hash tables for storing custom wait event ids and their names in
  * shared memory.
  *
- * WaitEventExtensionHashById is used to find the name from an event id.
+ * WaitEventCustomHashById is used to find the name from an event id.
  * Any backend can search it to find custom wait events.
  *
- * WaitEventExtensionHashByName is used to find the event ID from a name.
+ * WaitEventCustomHashByName is used to find the ID from a name.
  * It is used to ensure that no duplicated entries are registered.
  *
+ * For simplicity, we use the same ID counter across types of custom events.
+ * We could end that anytime the need arises.
+ *
  * The size of the hash table is based on the assumption that
- * WAIT_EVENT_EXTENSION_HASH_INIT_SIZE is enough for most cases, and it seems
+ * WAIT_EVENT_CUSTOM_HASH_INIT_SIZE is enough for most cases, and it seems
  * unlikely that the number of entries will reach
- * WAIT_EVENT_EXTENSION_HASH_MAX_SIZE.
+ * WAIT_EVENT_CUSTOM_HASH_MAX_SIZE.
  */
-static HTAB *WaitEventExtensionHashById;	/* find names from IDs */
-static HTAB *WaitEventExtensionHashByName;	/* find IDs from names */
+static HTAB *WaitEventCustomHashById;	/* find names from IDs */
+static HTAB *WaitEventCustomHashByName; /* find IDs from names */
 
-#define WAIT_EVENT_EXTENSION_HASH_INIT_SIZE	16
-#define WAIT_EVENT_EXTENSION_HASH_MAX_SIZE	128
+#define WAIT_EVENT_CUSTOM_HASH_INIT_SIZE	16
+#define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE	128
 
 /* hash table entries */
-typedef struct WaitEventExtensionEntryById
+typedef struct WaitEventCustomEntryById
 {
 	uint16		event_id;		/* hash key */
 	char		wait_event_name[NAMEDATALEN];	/* custom wait event name */
-} WaitEventExtensionEntryById;
+} WaitEventCustomEntryById;
 
-typedef struct WaitEventExtensionEntryByName
+typedef struct WaitEventCustomEntryByName
 {
 	char		wait_event_name[NAMEDATALEN];	/* hash key */
-	uint16		event_id;		/* wait event ID */
-} WaitEventExtensionEntryByName;
+	uint32		wait_event_info;
+} WaitEventCustomEntryByName;
 
 
-/* dynamic allocation counter for custom wait events in extensions */
-typedef struct WaitEventExtensionCounterData
+/* dynamic allocation counter for custom wait events */
+typedef struct WaitEventCustomCounterData
 {
 	int			nextId;			/* next ID to assign */
 	slock_t		mutex;			/* protects the counter */
-} WaitEventExtensionCounterData;
+} WaitEventCustomCounterData;
 
 /* pointer to the shared memory */
-static WaitEventExtensionCounterData *WaitEventExtensionCounter;
+static WaitEventCustomCounterData *WaitEventCustomCounter;
 
-/* first event ID of custom wait events for extensions */
-#define WAIT_EVENT_EXTENSION_INITIAL_ID	1
+/* first event ID of custom wait events */
+#define WAIT_EVENT_CUSTOM_INITIAL_ID	1
 
-/* wait event info for extensions */
-#define WAIT_EVENT_EXTENSION_INFO(eventId)	(PG_WAIT_EXTENSION | eventId)
-
-static const char *GetWaitEventExtensionIdentifier(uint16 eventId);
+static uint32 WaitEventCustomNew(uint32 classId, const char *wait_event_name);
+static const char *GetWaitEventCustomIdentifier(uint16 eventId);
 
 /*
  *  Return the space for dynamic shared hash tables and dynamic allocation counter.
  */
 Size
-WaitEventExtensionShmemSize(void)
+WaitEventCustomShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(WaitEventExtensionCounterData));
-	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-										 sizeof(WaitEventExtensionEntryById)));
-	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-										 sizeof(WaitEventExtensionEntryByName)));
+	sz = MAXALIGN(sizeof(WaitEventCustomCounterData));
+	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+										 sizeof(WaitEventCustomEntryById)));
+	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+										 sizeof(WaitEventCustomEntryByName)));
 	return sz;
 }
 
@@ -116,39 +117,39 @@ WaitEventExtensionShmemSize(void)
  * Allocate shmem space for dynamic shared hash and dynamic allocation counter.
  */
 void
-WaitEventExtensionShmemInit(void)
+WaitEventCustomShmemInit(void)
 {
 	bool		found;
 	HASHCTL		info;
 
-	WaitEventExtensionCounter = (WaitEventExtensionCounterData *)
-		ShmemInitStruct("WaitEventExtensionCounterData",
-						sizeof(WaitEventExtensionCounterData), &found);
+	WaitEventCustomCounter = (WaitEventCustomCounterData *)
+		ShmemInitStruct("WaitEventCustomCounterData",
+						sizeof(WaitEventCustomCounterData), &found);
 
 	if (!found)
 	{
 		/* initialize the allocation counter and its spinlock. */
-		WaitEventExtensionCounter->nextId = WAIT_EVENT_EXTENSION_INITIAL_ID;
-		SpinLockInit(&WaitEventExtensionCounter->mutex);
+		WaitEventCustomCounter->nextId = WAIT_EVENT_CUSTOM_INITIAL_ID;
+		SpinLockInit(&WaitEventCustomCounter->mutex);
 	}
 
 	/* initialize or attach the hash tables to store custom wait events */
 	info.keysize = sizeof(uint16);
-	info.entrysize = sizeof(WaitEventExtensionEntryById);
-	WaitEventExtensionHashById = ShmemInitHash("WaitEventExtension hash by id",
-											   WAIT_EVENT_EXTENSION_HASH_INIT_SIZE,
-											   WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-											   &info,
-											   HASH_ELEM | HASH_BLOBS);
+	info.entrysize = sizeof(WaitEventCustomEntryById);
+	WaitEventCustomHashById = ShmemInitHash("WaitEventCustom hash by id",
+											WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
+											WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+											&info,
+											HASH_ELEM | HASH_BLOBS);
 
 	/* key is a NULL-terminated string */
 	info.keysize = sizeof(char[NAMEDATALEN]);
-	info.entrysize = sizeof(WaitEventExtensionEntryByName);
-	WaitEventExtensionHashByName = ShmemInitHash("WaitEventExtension hash by name",
-												 WAIT_EVENT_EXTENSION_HASH_INIT_SIZE,
-												 WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-												 &info,
-												 HASH_ELEM | HASH_STRINGS);
+	info.entrysize = sizeof(WaitEventCustomEntryByName);
+	WaitEventCustomHashByName = ShmemInitHash("WaitEventCustom hash by name",
+											  WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
+											  WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+											  &info,
+											  HASH_ELEM | HASH_STRINGS);
 }
 
 /*
@@ -160,10 +161,23 @@ WaitEventExtensionShmemInit(void)
 uint32
 WaitEventExtensionNew(const char *wait_event_name)
 {
+	return WaitEventCustomNew(PG_WAIT_EXTENSION, wait_event_name);
+}
+
+uint32
+WaitEventInjectionPointNew(const char *wait_event_name)
+{
+	return WaitEventCustomNew(PG_WAIT_INJECTIONPOINT, wait_event_name);
+}
+
+static uint32
+WaitEventCustomNew(uint32 classId, const char *wait_event_name)
+{
 	uint16		eventId;
 	bool		found;
-	WaitEventExtensionEntryByName *entry_by_name;
-	WaitEventExtensionEntryById *entry_by_id;
+	WaitEventCustomEntryByName *entry_by_name;
+	WaitEventCustomEntryById *entry_by_id;
+	uint32		wait_event_info;
 
 	/* Check the limit of the length of the event name */
 	if (strlen(wait_event_name) >= NAMEDATALEN)
@@ -175,13 +189,24 @@ WaitEventExtensionNew(const char *wait_event_name)
 	 * Check if the wait event info associated to the name is already defined,
 	 * and return it if so.
 	 */
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_FIND, &found);
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 	if (found)
-		return WAIT_EVENT_EXTENSION_INFO(entry_by_name->event_id);
+	{
+		uint32		oldClassId;
+
+		oldClassId = entry_by_name->wait_event_info & WAIT_EVENT_CLASS_MASK;
+		if (oldClassId != classId)
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("wait event \"%s\" already exists in type \"%s\"",
+							wait_event_name,
+							pgstat_get_wait_event_type(entry_by_name->wait_event_info))));
+		return entry_by_name->wait_event_info;
+	}
 
 	/*
 	 * Allocate and register a new wait event.  Recheck if the event name
@@ -189,69 +214,79 @@ WaitEventExtensionNew(const char *wait_event_name)
 	 * one with the same name since the LWLock acquired again here was
 	 * previously released.
 	 */
-	LWLockAcquire(WaitEventExtensionLock, LW_EXCLUSIVE);
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	LWLockAcquire(WaitEventCustomLock, LW_EXCLUSIVE);
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_FIND, &found);
 	if (found)
 	{
-		LWLockRelease(WaitEventExtensionLock);
-		return WAIT_EVENT_EXTENSION_INFO(entry_by_name->event_id);
+		uint32		oldClassId;
+
+		LWLockRelease(WaitEventCustomLock);
+		oldClassId = entry_by_name->wait_event_info & WAIT_EVENT_CLASS_MASK;
+		if (oldClassId != classId)
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("wait event \"%s\" already exists in type \"%s\"",
+							wait_event_name,
+							pgstat_get_wait_event_type(entry_by_name->wait_event_info))));
+		return entry_by_name->wait_event_info;
 	}
 
 	/* Allocate a new event Id */
-	SpinLockAcquire(&WaitEventExtensionCounter->mutex);
+	SpinLockAcquire(&WaitEventCustomCounter->mutex);
 
-	if (WaitEventExtensionCounter->nextId >= WAIT_EVENT_EXTENSION_HASH_MAX_SIZE)
+	if (WaitEventCustomCounter->nextId >= WAIT_EVENT_CUSTOM_HASH_MAX_SIZE)
 	{
-		SpinLockRelease(&WaitEventExtensionCounter->mutex);
+		SpinLockRelease(&WaitEventCustomCounter->mutex);
 		ereport(ERROR,
 				errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-				errmsg("too many wait events for extensions"));
+				errmsg("too many custom wait events"));
 	}
 
-	eventId = WaitEventExtensionCounter->nextId++;
+	eventId = WaitEventCustomCounter->nextId++;
 
-	SpinLockRelease(&WaitEventExtensionCounter->mutex);
+	SpinLockRelease(&WaitEventCustomCounter->mutex);
 
 	/* Register the new wait event */
-	entry_by_id = (WaitEventExtensionEntryById *)
-		hash_search(WaitEventExtensionHashById, &eventId,
+	entry_by_id = (WaitEventCustomEntryById *)
+		hash_search(WaitEventCustomHashById, &eventId,
 					HASH_ENTER, &found);
 	Assert(!found);
 	strlcpy(entry_by_id->wait_event_name, wait_event_name,
 			sizeof(entry_by_id->wait_event_name));
 
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_ENTER, &found);
 	Assert(!found);
-	entry_by_name->event_id = eventId;
+	wait_event_info = classId | eventId;
+	entry_by_name->wait_event_info = wait_event_info;
 
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 
-	return WAIT_EVENT_EXTENSION_INFO(eventId);
+	return wait_event_info;
 }
 
 /*
- * Return the name of an wait event ID for extension.
+ * Return the name of a custom wait event ID.
  */
 static const char *
-GetWaitEventExtensionIdentifier(uint16 eventId)
+GetWaitEventCustomIdentifier(uint16 eventId)
 {
 	bool		found;
-	WaitEventExtensionEntryById *entry;
+	WaitEventCustomEntryById *entry;
 
 	/* Built-in event? */
-	if (eventId < WAIT_EVENT_EXTENSION_INITIAL_ID)
+	if (eventId < WAIT_EVENT_CUSTOM_INITIAL_ID)
 		return "Extension";
 
 	/* It is a user-defined wait event, so lookup hash table. */
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
-	entry = (WaitEventExtensionEntryById *)
-		hash_search(WaitEventExtensionHashById, &eventId,
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
+	entry = (WaitEventCustomEntryById *)
+		hash_search(WaitEventCustomHashById, &eventId,
 					HASH_FIND, &found);
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 
 	if (!entry)
 		elog(ERROR, "could not find custom wait event name for ID %u",
@@ -267,35 +302,35 @@ GetWaitEventExtensionIdentifier(uint16 eventId)
  * *nwaitevents.
  */
 char	  **
-GetWaitEventExtensionNames(int *nwaitevents)
+GetWaitEventCustomNames(uint32 classId, int *nwaitevents)
 {
 	char	  **waiteventnames;
-	WaitEventExtensionEntryByName *hentry;
+	WaitEventCustomEntryByName *hentry;
 	HASH_SEQ_STATUS hash_seq;
 	int			index;
 	int			els;
 
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
 
 	/* Now we can safely count the number of entries */
-	els = hash_get_num_entries(WaitEventExtensionHashByName);
+	els = hash_get_num_entries(WaitEventCustomHashByName);
 
 	/* Allocate enough space for all entries */
 	waiteventnames = palloc(els * sizeof(char *));
 
 	/* Now scan the hash table to copy the data */
-	hash_seq_init(&hash_seq, WaitEventExtensionHashByName);
+	hash_seq_init(&hash_seq, WaitEventCustomHashByName);
 
 	index = 0;
-	while ((hentry = (WaitEventExtensionEntryByName *) hash_seq_search(&hash_seq)) != NULL)
+	while ((hentry = (WaitEventCustomEntryByName *) hash_seq_search(&hash_seq)) != NULL)
 	{
+		if ((hentry->wait_event_info & WAIT_EVENT_CLASS_MASK) != classId)
+			continue;
 		waiteventnames[index] = pstrdup(hentry->wait_event_name);
 		index++;
 	}
 
-	LWLockRelease(WaitEventExtensionLock);
-
-	Assert(index == els);
+	LWLockRelease(WaitEventCustomLock);
 
 	*nwaitevents = index;
 	return waiteventnames;
@@ -374,6 +409,9 @@ pgstat_get_wait_event_type(uint32 wait_event_info)
 		case PG_WAIT_IO:
 			event_type = "IO";
 			break;
+		case PG_WAIT_INJECTIONPOINT:
+			event_type = "InjectionPoint";
+			break;
 		default:
 			event_type = "???";
 			break;
@@ -411,7 +449,8 @@ pgstat_get_wait_event(uint32 wait_event_info)
 			event_name = GetLockNameFromTagType(eventId);
 			break;
 		case PG_WAIT_EXTENSION:
-			event_name = GetWaitEventExtensionIdentifier(eventId);
+		case PG_WAIT_INJECTIONPOINT:
+			event_name = GetWaitEventCustomIdentifier(eventId);
 			break;
 		case PG_WAIT_BUFFERPIN:
 			{
diff --git a/src/backend/utils/activity/wait_event_funcs.c b/src/backend/utils/activity/wait_event_funcs.c
index ba244c2..fa8bc05 100644
--- a/src/backend/utils/activity/wait_event_funcs.c
+++ b/src/backend/utils/activity/wait_event_funcs.c
@@ -48,7 +48,7 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 #define PG_GET_WAIT_EVENTS_COLS 3
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	char	  **waiteventnames;
-	int			nbextwaitevents;
+	int			nbwaitevents;
 
 	/* Build tuplestore to hold the result rows */
 	InitMaterializedSRF(fcinfo, 0);
@@ -67,9 +67,10 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 	}
 
 	/* Handle custom wait events for extensions */
-	waiteventnames = GetWaitEventExtensionNames(&nbextwaitevents);
+	waiteventnames = GetWaitEventCustomNames(PG_WAIT_EXTENSION,
+											 &nbwaitevents);
 
-	for (int idx = 0; idx < nbextwaitevents; idx++)
+	for (int idx = 0; idx < nbwaitevents; idx++)
 	{
 		StringInfoData buf;
 		Datum		values[PG_GET_WAIT_EVENTS_COLS] = {0};
@@ -89,5 +90,29 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
 	}
 
+	/* Likewise for injection points */
+	waiteventnames = GetWaitEventCustomNames(PG_WAIT_INJECTIONPOINT,
+											 &nbwaitevents);
+
+	for (int idx = 0; idx < nbwaitevents; idx++)
+	{
+		StringInfoData buf;
+		Datum		values[PG_GET_WAIT_EVENTS_COLS] = {0};
+		bool		nulls[PG_GET_WAIT_EVENTS_COLS] = {0};
+
+
+		values[0] = CStringGetTextDatum("InjectionPoint");
+		values[1] = CStringGetTextDatum(waiteventnames[idx]);
+
+		initStringInfo(&buf);
+		appendStringInfo(&buf,
+						 "Waiting for injection point \"%s\"",
+						 waiteventnames[idx]);
+
+		values[2] = CStringGetTextDatum(buf.data);
+
+		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+	}
+
 	return (Datum) 0;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 87cbca2..db37bee 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -340,7 +340,7 @@ LogicalRepWorker	"Waiting to read or update the state of logical replication wor
 XactTruncation	"Waiting to execute <function>pg_xact_status</function> or update the oldest transaction ID available to it."
 WrapLimitsVacuum	"Waiting to update limits on transaction id and multixact consumption."
 NotifyQueueTail	"Waiting to update limit on <command>NOTIFY</command> message storage."
-WaitEventExtension	"Waiting to read or update custom wait events information for extensions."
+WaitEventCustom	"Waiting to read or update custom wait events information."
 WALSummarizer	"Waiting to read or update WAL summarization state."
 DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 85f6568..6a2f64c 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -78,7 +78,7 @@ PG_LWLOCK(44, XactTruncation)
 /* 45 was XactTruncationLock until removal of BackendRandomLock */
 PG_LWLOCK(46, WrapLimitsVacuum)
 PG_LWLOCK(47, NotifyQueueTail)
-PG_LWLOCK(48, WaitEventExtension)
+PG_LWLOCK(48, WaitEventCustom)
 PG_LWLOCK(49, WALSummarizer)
 PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 1b735d4..9f18a75 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -24,6 +24,7 @@
 #define PG_WAIT_IPC					0x08000000U
 #define PG_WAIT_TIMEOUT				0x09000000U
 #define PG_WAIT_IO					0x0A000000U
+#define PG_WAIT_INJECTIONPOINT		0x0B000000U
 
 /* enums for wait events */
 #include "utils/wait_event_types.h"
@@ -38,26 +39,28 @@ extern void pgstat_reset_wait_event_storage(void);
 extern PGDLLIMPORT uint32 *my_wait_event_info;
 
 
-/* ----------
- * Wait Events - Extension
+/*
+ * Wait Events - Extension, InjectionPoint
  *
- * Use this category when the server process is waiting for some condition
- * defined by an extension module.
+ * Use InjectionPoint when the server process is waiting in an injection
+ * point.  Use Extension for other cases of the server process waiting for
+ * some condition defined by an extension module.
  *
- * Extensions can define their own wait events in this category.  They should
- * call WaitEventExtensionNew() with a wait event string.  If the wait event
- * associated to a string is already allocated, it returns the wait event
- * information to use.  If not, it gets one wait event ID allocated from
+ * Extensions can define their own wait events in these categories.  They
+ * should call one of these functions with a wait event string.  If the wait
+ * event associated to a string is already allocated, it returns the wait
+ * event information to use.  If not, it gets one wait event ID allocated from
  * a shared counter, associates the string to the ID in the shared dynamic
  * hash and returns the wait event information.
  *
  * The ID retrieved can be used with pgstat_report_wait_start() or equivalent.
  */
-extern void WaitEventExtensionShmemInit(void);
-extern Size WaitEventExtensionShmemSize(void);
-
 extern uint32 WaitEventExtensionNew(const char *wait_event_name);
-extern char **GetWaitEventExtensionNames(int *nwaitevents);
+extern uint32 WaitEventInjectionPointNew(const char *wait_event_name);
+
+extern void WaitEventCustomShmemInit(void);
+extern Size WaitEventCustomShmemSize(void);
+extern char **GetWaitEventCustomNames(uint32 classId, int *nwaitevents);
 
 /* ----------
  * pgstat_report_wait_start() -
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 5c44625..1b695a1 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -216,7 +216,7 @@ injection_wait(const char *name, const void *private_data)
 	 * this custom wait event name is not released, but we don't care much for
 	 * testing as this should be short-lived.
 	 */
-	injection_wait_event = WaitEventExtensionNew(name);
+	injection_wait_event = WaitEventInjectionPointNew(name);
 
 	/*
 	 * Find a free slot to wait for, and register this injection point's name.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 61ad417..5696604 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3099,9 +3099,9 @@ WaitEvent
 WaitEventActivity
 WaitEventBufferPin
 WaitEventClient
-WaitEventExtensionCounterData
-WaitEventExtensionEntryById
-WaitEventExtensionEntryByName
+WaitEventCustomCounterData
+WaitEventCustomEntryById
+WaitEventCustomEntryByName
 WaitEventIO
 WaitEventIPC
 WaitEventSet

inplace040-waitfuncs-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Create waitfuncs.c for pg_isolation_test_session_is_blocked().
    
    The next commit makes the function inspect an additional non-lock
    contention source, so it no longer fits in lockfuncs.c.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 610ccf2..edb09d4 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -116,6 +116,7 @@ OBJS = \
 	varchar.o \
 	varlena.o \
 	version.o \
+	waitfuncs.o \
 	windowfuncs.o \
 	xid.o \
 	xid8funcs.o \
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 13009cc..e790f85 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -13,7 +13,6 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/predicate_internals.h"
@@ -602,84 +601,6 @@ pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
 
 
 /*
- * pg_isolation_test_session_is_blocked - support function for isolationtester
- *
- * Check if specified PID is blocked by any of the PIDs listed in the second
- * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
- *
- * This is an undocumented function intended for use by the isolation tester,
- * and may change in future releases as required for testing purposes.
- */
-Datum
-pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
-{
-	int			blocked_pid = PG_GETARG_INT32(0);
-	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
-	ArrayType  *blocking_pids_a;
-	int32	   *interesting_pids;
-	int32	   *blocking_pids;
-	int			num_interesting_pids;
-	int			num_blocking_pids;
-	int			dummy;
-	int			i,
-				j;
-
-	/* Validate the passed-in array */
-	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
-	if (array_contains_nulls(interesting_pids_a))
-		elog(ERROR, "array must not contain nulls");
-	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
-	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
-										  ARR_DIMS(interesting_pids_a));
-
-	/*
-	 * Get the PIDs of all sessions blocking the given session's attempt to
-	 * acquire heavyweight locks.
-	 */
-	blocking_pids_a =
-		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
-
-	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
-	Assert(!array_contains_nulls(blocking_pids_a));
-	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
-	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
-									   ARR_DIMS(blocking_pids_a));
-
-	/*
-	 * Check if any of these are in the list of interesting PIDs, that being
-	 * the sessions that the isolation tester is running.  We don't use
-	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
-	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
-	 * blocking_pids to be usually empty and otherwise a very small number in
-	 * isolation tester cases, so make that the outer loop of a naive search
-	 * for a match.
-	 */
-	for (i = 0; i < num_blocking_pids; i++)
-		for (j = 0; j < num_interesting_pids; j++)
-		{
-			if (blocking_pids[i] == interesting_pids[j])
-				PG_RETURN_BOOL(true);
-		}
-
-	/*
-	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
-	 * theory check the resulting array of blocker PIDs against the
-	 * interesting PIDs list, but since there is no danger of autovacuum
-	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
-	 * on allocating a buffer and searching for overlap; so it's presently
-	 * sufficient for the isolation tester's purposes to use a single element
-	 * buffer and check if the number of safe snapshot blockers is non-zero.
-	 */
-	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
-		PG_RETURN_BOOL(true);
-
-	PG_RETURN_BOOL(false);
-}
-
-
-/*
  * Functions for manipulating advisory locks
  *
  * We make use of the locktag fields as follows:
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 48dbcf5..8c6fc80 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -103,6 +103,7 @@ backend_sources += files(
   'varchar.c',
   'varlena.c',
   'version.c',
+  'waitfuncs.c',
   'windowfuncs.c',
   'xid.c',
   'xid8funcs.c',
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
new file mode 100644
index 0000000..d9c92c3
--- /dev/null
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -0,0 +1,96 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitfuncs.c
+ *		Functions for SQL access to syntheses of multiple contention types.
+ *
+ * Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/utils/adt/waitfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "storage/predicate_internals.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+
+
+/*
+ * pg_isolation_test_session_is_blocked - support function for isolationtester
+ *
+ * Check if specified PID is blocked by any of the PIDs listed in the second
+ * argument.  Currently, this looks for blocking caused by waiting for
+ * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
+ * not directly under the isolationtester's control, eg autovacuum.
+ *
+ * This is an undocumented function intended for use by the isolation tester,
+ * and may change in future releases as required for testing purposes.
+ */
+Datum
+pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	ArrayType  *blocking_pids_a;
+	int32	   *interesting_pids;
+	int32	   *blocking_pids;
+	int			num_interesting_pids;
+	int			num_blocking_pids;
+	int			dummy;
+	int			i,
+				j;
+
+	/* Validate the passed-in array */
+	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
+	if (array_contains_nulls(interesting_pids_a))
+		elog(ERROR, "array must not contain nulls");
+	interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
+	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
+										  ARR_DIMS(interesting_pids_a));
+
+	/*
+	 * Get the PIDs of all sessions blocking the given session's attempt to
+	 * acquire heavyweight locks.
+	 */
+	blocking_pids_a =
+		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
+
+	Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
+	Assert(!array_contains_nulls(blocking_pids_a));
+	blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
+	num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
+									   ARR_DIMS(blocking_pids_a));
+
+	/*
+	 * Check if any of these are in the list of interesting PIDs, that being
+	 * the sessions that the isolation tester is running.  We don't use
+	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
+	 * our goals is to run quickly with debug_discard_caches > 0.  We expect
+	 * blocking_pids to be usually empty and otherwise a very small number in
+	 * isolation tester cases, so make that the outer loop of a naive search
+	 * for a match.
+	 */
+	for (i = 0; i < num_blocking_pids; i++)
+		for (j = 0; j < num_interesting_pids; j++)
+		{
+			if (blocking_pids[i] == interesting_pids[j])
+				PG_RETURN_BOOL(true);
+		}
+
+	/*
+	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
+	 * theory check the resulting array of blocker PIDs against the
+	 * interesting PIDs list, but since there is no danger of autovacuum
+	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
+	 * on allocating a buffer and searching for overlap; so it's presently
+	 * sufficient for the isolation tester's purposes to use a single element
+	 * buffer and check if the number of safe snapshot blockers is non-zero.
+	 */
+	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
+		PG_RETURN_BOOL(true);
+
+	PG_RETURN_BOOL(false);
+}

inplace050-tests-inj-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add an injection_points isolation test suite.
    
    Make the isolation harness recognize injection_points wait events as a
    type of blocked state.  To simplify that, change that wait event naming
    scheme to INJECTION_POINT(name).  Add an initial test for an extant
    inplace-update bug.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 82bb9cb..91b2014 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -63,6 +63,7 @@
 #include "storage/procarray.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
@@ -6080,6 +6081,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+	INJECTION_POINT("inplace-before-pin");
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
index d9c92c3..e135c9e 100644
--- a/src/backend/utils/adt/waitfuncs.c
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -14,8 +14,13 @@
 
 #include "catalog/pg_type.h"
 #include "storage/predicate_internals.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/wait_event.h"
+
+#define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
 
 
 /*
@@ -23,8 +28,9 @@
  *
  * Check if specified PID is blocked by any of the PIDs listed in the second
  * argument.  Currently, this looks for blocking caused by waiting for
- * heavyweight locks or safe snapshots.  We ignore blockage caused by PIDs
- * not directly under the isolationtester's control, eg autovacuum.
+ * injection points, heavyweight locks, or safe snapshots.  We ignore blockage
+ * caused by PIDs not directly under the isolationtester's control, eg
+ * autovacuum.
  *
  * This is an undocumented function intended for use by the isolation tester,
  * and may change in future releases as required for testing purposes.
@@ -34,6 +40,8 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 {
 	int			blocked_pid = PG_GETARG_INT32(0);
 	ArrayType  *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
+	PGPROC	   *proc;
+	const char *wait_event_type;
 	ArrayType  *blocking_pids_a;
 	int32	   *interesting_pids;
 	int32	   *blocking_pids;
@@ -43,6 +51,15 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 	int			i,
 				j;
 
+	/* Check if blocked_pid is in an injection point. */
+	proc = BackendPidGetProc(blocked_pid);
+	if (proc == NULL)
+		PG_RETURN_BOOL(false);	/* session gone: definitely unblocked */
+	wait_event_type =
+		pgstat_get_wait_event_type(UINT32_ACCESS_ONCE(proc->wait_event_info));
+	if (wait_event_type && strcmp("InjectionPoint", wait_event_type) == 0)
+		PG_RETURN_BOOL(true);
+
 	/* Validate the passed-in array */
 	Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
 	if (array_contains_nulls(interesting_pids_a))
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 31bd787..2ffd2f7 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -9,6 +9,8 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
+ISOLATION = inplace
+
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
 
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
new file mode 100644
index 0000000..123f45a
--- /dev/null
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -0,0 +1,43 @@
+Parsed test spec with 3 sessions
+
+starting permutation: vac1 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+ERROR:  could not create unique index "pg_class_oid_index"
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 8e1b5b4..3c23c14 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -37,4 +37,9 @@ tests += {
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
   },
+  'isolation': {
+    'specs': [
+      'inplace',
+    ],
+  },
 }
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
new file mode 100644
index 0000000..e957713
--- /dev/null
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -0,0 +1,83 @@
+# Test race conditions involving:
+# - s1: VACUUM inplace-updating a pg_class row
+# - s2: GRANT/REVOKE making pg_class rows dead
+# - s3: "VACUUM pg_class" making dead rows LP_UNUSED; DDL reusing them
+
+# Need GRANT to make a non-HOT update.  Otherwise, "VACUUM pg_class" would
+# leave an LP_REDIRECT that persists.  To get non-HOT, make rels so the
+# pg_class row for vactest.orig50 is on a filled page (assuming BLCKSZ=8192).
+# Just to save on filesystem syscalls, use relkind=c for every other rel.
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA vactest;
+	CREATE FUNCTION vactest.mkrels(text, int, int) RETURNS void
+		LANGUAGE plpgsql SET search_path = vactest AS $$
+	DECLARE
+		tname text;
+	BEGIN
+		FOR i in $2 .. $3 LOOP
+			tname := $1 || i;
+			EXECUTE FORMAT('CREATE TYPE ' || tname || ' AS ()');
+			RAISE DEBUG '% at %', tname, ctid
+				FROM pg_class WHERE oid = tname::regclass;
+		END LOOP;
+	END
+	$$;
+}
+setup	{ VACUUM FULL pg_class;  -- reduce free space }
+setup
+{
+	SELECT vactest.mkrels('orig', 1, 49);
+	CREATE TABLE vactest.orig50 ();
+	SELECT vactest.mkrels('orig', 51, 100);
+}
+
+# XXX DROP causes an assertion failure; adopt DROP once fixed
+teardown
+{
+	--DROP SCHEMA vactest CASCADE;
+	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP EXTENSION injection_points;
+}
+
+# Wait during inplace update, in a VACUUM of vactest.orig50.
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('inplace-before-pin', 'wait');
+}
+step vac1	{ VACUUM vactest.orig50;  -- wait during inplace update }
+# One bug scenario leaves two live pg_class tuples for vactest.orig50 and zero
+# live tuples for one of the "intruder" rels.  REINDEX observes the duplicate.
+step read1	{
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+}
+
+
+# Transactional updates of the tuple vac1 is waiting to inplace-update.
+session s2
+step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
+
+
+# Non-blocking actions.
+session s3
+step vac3		{ VACUUM pg_class; }
+# Reuse the lp that vac1 is waiting to change.  I've observed reuse at the 1st
+# or 18th CREATE, so create excess.
+step mkrels3	{
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+}
+
+
+# XXX extant bug
+permutation
+	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	grant2			# T0 becomes eligible for pruning, T1 is successor
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	read1

inplace060-nodeModifyTable-comments-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Expand comments and add an assertion in nodeModifyTable.c.
    
    Most comments concern RELKIND_VIEW.  One addresses the ExecUpdate()
    "tupleid" parameter.  A later commit will rely on these facts, but they
    hold already.  Back-patch to v12 (all supported versions), the plan for
    that commit.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3..a2442b7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -24,6 +24,14 @@
  *		values plus row-locating info for UPDATE and MERGE cases, or just the
  *		row-locating info for DELETE cases.
  *
+ *		The relation to modify can be an ordinary table, a view having an
+ *		INSTEAD OF trigger, or a foreign table.  Earlier processing already
+ *		pointed ModifyTable to the underlying relations of any automatically
+ *		updatable view not using an INSTEAD OF trigger, so code here can
+ *		assume it won't have one as a modification target.  This node does
+ *		process ri_WithCheckOptions, which may have expressions from those
+ *		automatically updatable views.
+ *
  *		MERGE runs a join between the source relation and the target table.
  *		If any WHEN NOT MATCHED [BY TARGET] clauses are present, then the join
  *		is an outer join that might output tuples without a matching target
@@ -1398,18 +1406,18 @@ ExecDeleteEpilogue(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  *		DELETE is like UPDATE, except that we delete the tuple and no
  *		index modifications are needed.
  *
- *		When deleting from a table, tupleid identifies the tuple to
- *		delete and oldtuple is NULL.  When deleting from a view,
- *		oldtuple is passed to the INSTEAD OF triggers and identifies
- *		what to delete, and tupleid is invalid.  When deleting from a
- *		foreign table, tupleid is invalid; the FDW has to figure out
- *		which row to delete using data from the planSlot.  oldtuple is
- *		passed to foreign table triggers; it is NULL when the foreign
- *		table has no relevant triggers.  We use tupleDeleted to indicate
- *		whether the tuple is actually deleted, callers can use it to
- *		decide whether to continue the operation.  When this DELETE is a
- *		part of an UPDATE of partition-key, then the slot returned by
- *		EvalPlanQual() is passed back using output parameter epqreturnslot.
+ *		When deleting from a table, tupleid identifies the tuple to delete and
+ *		oldtuple is NULL.  When deleting through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to delete, and
+ *		tupleid is invalid.  When deleting from a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to delete using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.  We use
+ *		tupleDeleted to indicate whether the tuple is actually deleted,
+ *		callers can use it to decide whether to continue the operation.  When
+ *		this DELETE is a part of an UPDATE of partition-key, then the slot
+ *		returned by EvalPlanQual() is passed back using output parameter
+ *		epqreturnslot.
  *
  *		Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
@@ -2238,21 +2246,22 @@ ExecCrossPartitionUpdateForeignKey(ModifyTableContext *context,
  *		is, we don't want to get stuck in an infinite loop
  *		which corrupts your database..
  *
- *		When updating a table, tupleid identifies the tuple to
- *		update and oldtuple is NULL.  When updating a view, oldtuple
- *		is passed to the INSTEAD OF triggers and identifies what to
- *		update, and tupleid is invalid.  When updating a foreign table,
- *		tupleid is invalid; the FDW has to figure out which row to
- *		update using data from the planSlot.  oldtuple is passed to
- *		foreign table triggers; it is NULL when the foreign table has
- *		no relevant triggers.
+ *		When updating a table, tupleid identifies the tuple to update and
+ *		oldtuple is NULL.  When updating through a view INSTEAD OF trigger,
+ *		oldtuple is passed to the triggers and identifies what to update, and
+ *		tupleid is invalid.  When updating a foreign table, tupleid is
+ *		invalid; the FDW has to figure out which row to update using data from
+ *		the planSlot.  oldtuple is passed to foreign table triggers; it is
+ *		NULL when the foreign table has no relevant triggers.
  *
  *		slot contains the new tuple value to be stored.
  *		planSlot is the output of the ModifyTable's subplan; we use it
  *		to access values from other input tables (for RETURNING),
  *		row-ID junk columns, etc.
  *
- *		Returns RETURNING result if any, otherwise NULL.
+ *		Returns RETURNING result if any, otherwise NULL.  On exit, if tupleid
+ *		had identified the tuple to update, it will identify the tuple
+ *		actually updated after EvalPlanQual.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -2717,10 +2726,10 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 
 	/*-----
 	 * If we are dealing with a WHEN MATCHED case, tupleid or oldtuple is
-	 * valid, depending on whether the result relation is a table or a view.
-	 * We execute the first action for which the additional WHEN MATCHED AND
-	 * quals pass.  If an action without quals is found, that action is
-	 * executed.
+	 * valid, depending on whether the result relation is a table or a view
+	 * having an INSTEAD OF trigger.  We execute the first action for which
+	 * the additional WHEN MATCHED AND quals pass.  If an action without quals
+	 * is found, that action is executed.
 	 *
 	 * Similarly, in the WHEN NOT MATCHED BY SOURCE case, tupleid or oldtuple
 	 * is valid, and we look at the given WHEN NOT MATCHED BY SOURCE actions
@@ -2811,8 +2820,8 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  * Check and execute the first qualifying MATCHED or NOT MATCHED BY SOURCE
  * action, depending on whether the join quals are satisfied.  If the target
  * relation is a table, the current target tuple is identified by tupleid.
- * Otherwise, if the target relation is a view, oldtuple is the current target
- * tuple from the view.
+ * Otherwise, if the target relation is a view having an INSTEAD OF trigger,
+ * oldtuple is the current target tuple from the view.
  *
  * We start from the first WHEN MATCHED or WHEN NOT MATCHED BY SOURCE action
  * and check if the WHEN quals pass, if any. If the WHEN quals for the first
@@ -2878,8 +2887,11 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
 	if (oldtuple != NULL)
+	{
+		Assert(resultRelInfo->ri_TrigDesc);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
+	}
 	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
 											tupleid,
 											SnapshotAny,
@@ -3992,8 +4004,8 @@ ExecModifyTable(PlanState *pstate)
 			 * know enough here to set t_tableOid.  Quite separately from
 			 * this, the FDW may fetch its own junk attrs to identify the row.
 			 *
-			 * Other relevant relkinds, currently limited to views, always
-			 * have a wholerow attribute.
+			 * Other relevant relkinds, currently limited to views having
+			 * INSTEAD OF triggers, always have a wholerow attribute.
 			 */
 			else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
 			{

inplace065-lock-SequenceChangePersistence-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Lock owned sequences during ALTER TABLE SET { LOGGED | UNLOGGED }.
    
    These commands already make the persistence of owned sequences follow
    owned table persistence changes.  They didn't lock those sequences.
    They lost the effect of nextval() calls that other sessions make after
    the ALTER TABLE command, before the ALTER TABLE transaction commits.
    Fix by acquiring the same lock that ALTER SEQUENCE SET { LOGGED |
    UNLOGGED } acquires.  This might cause more deadlocks.  Back-patch to
    v15, where commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31 introduced
    unlogged sequences.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 28f8522..b4ad19c 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -545,6 +545,13 @@ SequenceChangePersistence(Oid relid, char newrelpersistence)
 	Buffer		buf;
 	HeapTupleData seqdatatuple;
 
+	/*
+	 * ALTER SEQUENCE acquires this lock earlier.  If we're processing an
+	 * owned sequence for ALTER TABLE, lock now.  Without the lock, we'd
+	 * discard increments from nextval() calls (in other sessions) between
+	 * this function's buffer unlock and this transaction's commit.
+	 */
+	LockRelationOid(relid, AccessExclusiveLock);
 	init_sequence(relid, &elm, &seqrel);
 
 	/* check the comment above nextval_internal()'s equivalent call. */

inplace071-lock-SetRelationHasSubclass-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX.
    
    Commit 5b562644fec696977df4a82790064e8287927891 added a comment that
    SetRelationHasSubclass() callers must hold this lock.  When commit
    17f206fbc824d2b4b14480199ca9ff7dea417eda extended use of this column to
    partitioned indexes, it didn't take the lock.  As the latter commit
    message mentioned, we currently never reset a partitioned index to
    relhassubclass=f.  That largely avoids harm from the lock omission.  The
    cause for fixing this now is to unblock introducing a rule about locks
    required to heap_update() a pg_class row.  This might cause more
    deadlocks.  It gives minor user-visible benefits:
    
    - If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE
      ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks
      instead of failing with "tuple concurrently updated".  (Many cases of
      DDL concurrency still fail that way.)
    
    - Match ALTER INDEX ATTACH PARTITION in choosing to lock the index.
    
    While not user-visible today, we'll need this if we ever make something
    set the flag to false for a partitioned index, like ANALYZE does today
    for tables.  Back-patch to v12 (all supported versions), the plan for
    the commit relying on the new rule.  In back branches, add
    LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 55fdde4..a819b41 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1058,6 +1058,7 @@ index_create(Relation heapRelation,
 	if (OidIsValid(parentIndexRelid))
 	{
 		StoreSingleInheritance(indexRelationId, parentIndexRelid, 1);
+		LockRelationOid(parentIndexRelid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentIndexRelid, true);
 	}
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 309389e..2caab88 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4355,7 +4355,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 
 	/* set relhassubclass if an index partition has been added to the parent */
 	if (OidIsValid(parentOid))
+	{
+		LockRelationOid(parentOid, ShareUpdateExclusiveLock);
 		SetRelationHasSubclass(parentOid, true);
+	}
 
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66cda26..8fcb188 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3483,8 +3483,15 @@ findAttrByName(const char *attributeName, const List *columns)
  * SetRelationHasSubclass
  *		Set the value of the relation's relhassubclass field in pg_class.
  *
- * NOTE: caller must be holding an appropriate lock on the relation.
- * ShareUpdateExclusiveLock is sufficient.
+ * It's always safe to set this field to true, because all SQL commands are
+ * ready to see true and then find no children.  On the other hand, commands
+ * generally assume zero children if this is false.
+ *
+ * Caller must hold any self-exclusive lock until end of transaction.  If the
+ * new value is false, caller must have acquired that lock before reading the
+ * evidence that justified the false value.  That way, it properly waits if
+ * another backend is simultaneously concluding no need to change the tuple
+ * (new and old values are true).
  *
  * NOTE: an important side-effect of this operation is that an SI invalidation
  * message is sent out to all backends --- including me --- causing plans
@@ -3499,6 +3506,11 @@ SetRelationHasSubclass(Oid relationId, bool relhassubclass)
 	HeapTuple	tuple;
 	Form_pg_class classtuple;
 
+	Assert(CheckRelationOidLockedByMe(relationId,
+									  ShareUpdateExclusiveLock, false) ||
+		   CheckRelationOidLockedByMe(relationId,
+									  ShareRowExclusiveLock, true));
+
 	/*
 	 * Fetch a modifiable copy of the tuple, modify it, update pg_class.
 	 */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index fe3cda2..094522a 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -335,32 +335,22 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
 						 relation->rd_lockInfo.lockRelId.dbId,
 						 relation->rd_lockInfo.lockRelId.relId);
 
-	if (LockHeldByMe(&tag, lockmode))
-		return true;
+	return LockHeldByMe(&tag, lockmode, orstronger);
+}
 
-	if (orstronger)
-	{
-		LOCKMODE	slockmode;
+/*
+ *		CheckRelationOidLockedByMe
+ *
+ * Like the above, but takes an OID as argument.
+ */
+bool
+CheckRelationOidLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+	LOCKTAG		tag;
 
-		for (slockmode = lockmode + 1;
-			 slockmode <= MaxLockMode;
-			 slockmode++)
-		{
-			if (LockHeldByMe(&tag, slockmode))
-			{
-#ifdef NOT_USED
-				/* Sometimes this might be useful for debugging purposes */
-				elog(WARNING, "lock mode %s substituted for %s on relation %s",
-					 GetLockmodeName(tag.locktag_lockmethodid, slockmode),
-					 GetLockmodeName(tag.locktag_lockmethodid, lockmode),
-					 RelationGetRelationName(relation));
-#endif
-				return true;
-			}
-		}
-	}
+	SetLocktagRelationOid(&tag, relid);
 
-	return false;
+	return LockHeldByMe(&tag, lockmode, orstronger);
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index f68c595..0400a50 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -578,11 +578,17 @@ DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)
 }
 
 /*
- * LockHeldByMe -- test whether lock 'locktag' is held with mode 'lockmode'
- *		by the current transaction
+ * LockHeldByMe -- test whether lock 'locktag' is held by the current
+ *		transaction
+ *
+ * Returns true if current transaction holds a lock on 'tag' of mode
+ * 'lockmode'.  If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
  */
 bool
-LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
+LockHeldByMe(const LOCKTAG *locktag,
+			 LOCKMODE lockmode, bool orstronger)
 {
 	LOCALLOCKTAG localtag;
 	LOCALLOCK  *locallock;
@@ -598,7 +604,23 @@ LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
 										  &localtag,
 										  HASH_FIND, NULL);
 
-	return (locallock && locallock->nLocks > 0);
+	if (locallock && locallock->nLocks > 0)
+		return true;
+
+	if (orstronger)
+	{
+		LOCKMODE	slockmode;
+
+		for (slockmode = lockmode + 1;
+			 slockmode <= MaxLockMode;
+			 slockmode++)
+		{
+			if (LockHeldByMe(locktag, slockmode, false))
+				return true;
+		}
+	}
+
+	return false;
 }
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 22b7856..ce15125 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,8 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
 extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
 extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
 									bool orstronger);
+extern bool CheckRelationOidLockedByMe(Oid relid, LOCKMODE lockmode,
+									   bool orstronger);
 extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
 
 extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 0017d4b..cc1f6e7 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -567,7 +567,8 @@ extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
 extern void LockReleaseSession(LOCKMETHODID lockmethodid);
 extern void LockReleaseCurrentOwner(LOCALLOCK **locallocks, int nlocks);
 extern void LockReassignCurrentOwner(LOCALLOCK **locallocks, int nlocks);
-extern bool LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode);
+extern bool LockHeldByMe(const LOCKTAG *locktag,
+						 LOCKMODE lockmode, bool orstronger);
 #ifdef USE_ASSERT_CHECKING
 extern HTAB *GetLockMethodLocalHash(void);
 #endif

inplace075-lock-heap_create-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    AccessExclusiveLock new relations just after assigning the OID.
    
    This has no user-visible, important consequences, since other sessions'
    catalog scans can't find the relation until we commit.  However, this
    unblocks introducing a rule about locks required to heap_update() a
    pg_class row.  CREATE TABLE has been acquiring this lock eventually, but
    it can heap_update() pg_class.relchecks earlier.  create_toast_table()
    has been acquiring only ShareLock.  Back-patch to v12 (all supported
    versions), the plan for the commit relying on the new rule.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index a122bbf..ae2efdc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1250,6 +1250,13 @@ heap_create_with_catalog(const char *relname,
 	}
 
 	/*
+	 * Other sessions' catalog scans can't find this until we commit.  Hence,
+	 * it doesn't hurt to hold AccessExclusiveLock.  Do it here so callers
+	 * can't accidentally vary in their lock mode or acquisition timing.
+	 */
+	LockRelationOid(relid, AccessExclusiveLock);
+
+	/*
 	 * Determine the relation's initial permissions.
 	 */
 	if (use_user_acl)

inplace080-catcache-detoast-inplace-stale-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Cope with inplace update making catcache stale during TOAST fetch.
    
    This extends ad98fb14226ae6456fbaed7990ee7591cbe5efd2 to invals of
    inplace updates.  Trouble requires an inplace update of a catalog having
    a TOAST table, so only pg_database was at risk.  (The other catalog on
    which core code performs inplace updates, pg_class, has no TOAST table.)
    Trouble would require something like the inplace-inval.spec test.
    Consider GRANT ... ON DATABASE fetching a stale row from cache and
    discarding a datfrozenxid update that vac_truncate_clog() has already
    relied upon.  Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3217008..6c39434 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -136,6 +136,27 @@ IsCatalogRelationOid(Oid relid)
 }
 
 /*
+ * IsInplaceUpdateRelation
+ *		True iff core code performs inplace updates on the relation.
+ */
+bool
+IsInplaceUpdateRelation(Relation relation)
+{
+	return IsInplaceUpdateOid(RelationGetRelid(relation));
+}
+
+/*
+ * IsInplaceUpdateOid
+ *		Like the above, but takes an OID as argument.
+ */
+bool
+IsInplaceUpdateOid(Oid relid)
+{
+	return (relid == RelationRelationId ||
+			relid == DatabaseRelationId);
+}
+
+/*
  * IsToastRelation
  *		True iff relation is a TOAST support relation (or index).
  *
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 569f51c..98aa527 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -2008,6 +2008,23 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
 
 
 /*
+ * equalTuple
+ *		Are these tuples memcmp()-equal?
+ */
+static bool
+equalTuple(HeapTuple a, HeapTuple b)
+{
+	uint32		alen;
+	uint32		blen;
+
+	alen = a->t_len;
+	blen = b->t_len;
+	return (alen == blen &&
+			memcmp((char *) a->t_data,
+				   (char *) b->t_data, blen) == 0);
+}
+
+/*
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
  *		supplied data into it.  The new entry initially has refcount 0.
@@ -2057,14 +2074,34 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
+			bool		need_cmp = IsInplaceUpdateOid(cache->cc_reloid);
+			HeapTuple	before = NULL;
+			bool		matches = true;
+
+			if (need_cmp)
+				before = heap_copytuple(ntp);
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
 
 			/*
 			 * The tuple could become stale while we are doing toast table
-			 * access (since AcceptInvalidationMessages can run then), so we
-			 * must recheck its visibility afterwards.
+			 * access (since AcceptInvalidationMessages can run then).
+			 * equalTuple() detects staleness from inplace updates, while
+			 * systable_recheck_tuple() detects staleness from normal updates.
+			 *
+			 * While this equalTuple() follows the usual rule of reading with
+			 * a pin and no buffer lock, it warrants suspicion since an
+			 * inplace update could appear at any moment.  It's safe because
+			 * the inplace update sends an invalidation that can't reorder
+			 * before the inplace heap change.  If the heap change reaches
+			 * this process just after equalTuple() looks, we've not missed
+			 * its inval.
 			 */
-			if (!systable_recheck_tuple(scandesc, ntp))
+			if (need_cmp)
+			{
+				matches = equalTuple(before, ntp);
+				heap_freetuple(before);
+			}
+			if (!matches || !systable_recheck_tuple(scandesc, ntp))
 			{
 				heap_freetuple(dtp);
 				return NULL;
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 1fd326e..a8dd304 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -21,11 +21,13 @@
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 extern bool IsCatalogRelation(Relation relation);
+extern bool IsInplaceUpdateRelation(Relation relation);
 
 extern bool IsSystemClass(Oid relid, Form_pg_class reltuple);
 extern bool IsToastClass(Form_pg_class reltuple);
 
 extern bool IsCatalogRelationOid(Oid relid);
+extern bool IsInplaceUpdateOid(Oid relid);
 
 extern bool IsCatalogNamespace(Oid namespaceId);
 extern bool IsToastNamespace(Oid namespaceId);

inplace090-LOCKTAG_TUPLE-eoxact-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..107507e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,188 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take lock that conflicts with DROP.  If it does happen
+ * somehow, we'll wait for it like we would an update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v3.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 107507e..797bddf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6334,7 +6447,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 143876b..49d4d5e 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0,
@@ -2073,6 +2075,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2186,7 +2190,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2262,6 +2266,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, ownerId, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..b70d2f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2320,6 +2320,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2328,6 +2330,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2421,6 +2424,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2525,6 +2536,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2850,6 +2869,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2886,17 +2906,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2968,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2979,7 +3015,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2999,7 +3035,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3017,7 +3054,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3028,7 +3065,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3109,7 +3146,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3183,7 +3220,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3201,6 +3238,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3229,7 +3275,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3257,13 +3303,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3310,6 +3356,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3761,6 +3811,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4073,6 +4124,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4084,6 +4137,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4092,6 +4146,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4103,6 +4162,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 35dbb87..cbf9cf2 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3767,6 +3767,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3809,11 +3810,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3933,9 +3935,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8bc421e..abd68e2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#34

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#33)

Re: race condition in pg_class

On Thu, Jun 13, 2024 at 05:35:49PM -0700, Noah Misch wrote:

I think the attached covers all comments to date. I gave everything v3, but
most patches have just a no-conflict rebase vs. v2. The exceptions are
inplace031-inj-wait-event (implements the holding from that branch of the
thread) and inplace050-tests-inj (updated to cooperate with inplace031). Much
of inplace031-inj-wait-event is essentially s/Extension/Custom/ for the
infrastructure common to the two custom wait event types.

Looking at inplace031-inj-wait-event..

The comment at the top of GetWaitEventCustomNames() requires an
update, still mentioning extensions.

GetWaitEventCustomIdentifier() is incorrect, and should return
"InjectionPoint" in the default case of this class name, no? I would
just pass the classID to GetWaitEventCustomIdentifier().

It is suboptimal to have pg_get_wait_events() do two scans of
WaitEventCustomHashByName. Wouldn't it be better to do a single scan,
returning a set of (class_name,event_name) fed to the tuplestore of
this SRF?

 uint32
 WaitEventExtensionNew(const char *wait_event_name)
 {
+	return WaitEventCustomNew(PG_WAIT_EXTENSION, wait_event_name);
+}
+
+uint32
+WaitEventInjectionPointNew(const char *wait_event_name)
+{
+	return WaitEventCustomNew(PG_WAIT_INJECTIONPOINT, wait_event_name);
+}

Hmm. The advantage of two routines is that it is possible to control
the class IDs allowed to use the custom wait events. Shouldn't the
second routine be documented in xfunc.sgml?

wait_event_names.txt also needs tweaks, in the shape of a new class
name for the new class "InjectionPoint" so as it can be documented for
its default case. That's a fallback if an event ID cannot be found,
which should not be the case, still that's more correct than showing
"Extension" for all class IDs covered by custom wait events.
--
Michael

#35

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#34)

Re: race condition in pg_class

On Fri, Jun 14, 2024 at 09:58:59AM +0900, Michael Paquier wrote:

Looking at inplace031-inj-wait-event..

The comment at the top of GetWaitEventCustomNames() requires an
update, still mentioning extensions.

Thanks. Fixed locally.

GetWaitEventCustomIdentifier() is incorrect, and should return
"InjectionPoint" in the default case of this class name, no?

I intentionally didn't provide a default event ID for InjectionPoint.
PG_WAIT_EXTENSION needs a default case for backward compatibility, if nothing
else. For this second custom type, it's needless complexity. The value
0x0B000000U won't just show up like PG_WAIT_EXTENSION does.
GetLWLockIdentifier() also has no default case. How do you see it?

I would
just pass the classID to GetWaitEventCustomIdentifier().

As you say, that would allow eventId==0 to raise "could not find custom wait
event" for PG_WAIT_INJECTIONPOINT instead of wrongly returning "Extension".
Even if 0x0B000000U somehow does show up, having pg_stat_activity report
"Extension" instead of an error, in a developer test run, feels unimportant to
me.

It is suboptimal to have pg_get_wait_events() do two scans of
WaitEventCustomHashByName. Wouldn't it be better to do a single scan,
returning a set of (class_name,event_name) fed to the tuplestore of
this SRF?

Micro-optimization of pg_get_wait_events() doesn't matter. I did consider
that or pushing more of the responsibility into wait_events.c, but I
considered it on code repetition grounds, not performance grounds.

uint32
WaitEventExtensionNew(const char *wait_event_name)
{
+	return WaitEventCustomNew(PG_WAIT_EXTENSION, wait_event_name);
+}
+
+uint32
+WaitEventInjectionPointNew(const char *wait_event_name)
+{
+	return WaitEventCustomNew(PG_WAIT_INJECTIONPOINT, wait_event_name);
+}
Hmm. The advantage of two routines is that it is possible to control
the class IDs allowed to use the custom wait events. Shouldn't the
second routine be documented in xfunc.sgml?

The patch added to xfunc.sgml an example of using it. I'd be more inclined to
delete the WaitEventExtensionNew() docbook documentation than to add its level
of detail for WaitEventInjectionPointNew(). We don't have that kind of
documentation for most extension-facing C functions.

#36

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#35)

Re: race condition in pg_class

On Thu, Jun 13, 2024 at 07:42:25PM -0700, Noah Misch wrote:

On Fri, Jun 14, 2024 at 09:58:59AM +0900, Michael Paquier wrote:

GetWaitEventCustomIdentifier() is incorrect, and should return
"InjectionPoint" in the default case of this class name, no?

I intentionally didn't provide a default event ID for InjectionPoint.
PG_WAIT_EXTENSION needs a default case for backward compatibility, if nothing
else. For this second custom type, it's needless complexity. The value
0x0B000000U won't just show up like PG_WAIT_EXTENSION does.
GetLWLockIdentifier() also has no default case. How do you see it?

I would add a default for consistency as this is just a few extra
lines, but if you feel strongly about that, I'm OK as well. It makes
a bit easier the detection of incorrect wait event numbers set
incorrectly in extensions depending on the class wanted.

The patch added to xfunc.sgml an example of using it. I'd be more inclined to
delete the WaitEventExtensionNew() docbook documentation than to add its level
of detail for WaitEventInjectionPointNew(). We don't have that kind of
documentation for most extension-facing C functions.

It's one of the areas where I think that we should have more
documentation, not less of it, so I'd rather keep it and maintaining
it is not really a pain (?). The backend gets complicated enough
these days that limiting what developers have to guess on their own is
a better long-term approach because the Postgres out-of-core ecosystem
is expanding a lot (aka have also in-core documentation for hooks,
even if there's been a lot of reluctance historically about having
them).
--
Michael

#37

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#36)

1 attachment(s)

Re: race condition in pg_class

On Sun, Jun 16, 2024 at 09:28:05AM +0900, Michael Paquier wrote:

On Thu, Jun 13, 2024 at 07:42:25PM -0700, Noah Misch wrote:

On Fri, Jun 14, 2024 at 09:58:59AM +0900, Michael Paquier wrote:

GetWaitEventCustomIdentifier() is incorrect, and should return
"InjectionPoint" in the default case of this class name, no?

I intentionally didn't provide a default event ID for InjectionPoint.
PG_WAIT_EXTENSION needs a default case for backward compatibility, if nothing
else. For this second custom type, it's needless complexity. The value
0x0B000000U won't just show up like PG_WAIT_EXTENSION does.
GetLWLockIdentifier() also has no default case. How do you see it?

I would add a default for consistency as this is just a few extra
lines, but if you feel strongly about that, I'm OK as well. It makes
a bit easier the detection of incorrect wait event numbers set
incorrectly in extensions depending on the class wanted.

It would be odd to detect exactly 0x0B000000U and not other invalid inputs,
like 0x0A000001U where only 0x0B000001U is valid. I'm attaching roughly what
it would take. Shall I squash this into inplace031?

The thing I feel strongly about here is keeping focus on fixing $SUBJECT bugs
that are actually corrupting data out there. I think we should all limit our
interest in the verbiage of strings that appear only when running developer
tests, especially when $SUBJECT is a bug fix. When the string appears only
after C code passes invalid input to other C code, it matters even less.

The patch added to xfunc.sgml an example of using it. I'd be more inclined to
delete the WaitEventExtensionNew() docbook documentation than to add its level
of detail for WaitEventInjectionPointNew(). We don't have that kind of
documentation for most extension-facing C functions.

It's one of the areas where I think that we should have more
documentation, not less of it, so I'd rather keep it and maintaining
it is not really a pain (?). The backend gets complicated enough
these days that limiting what developers have to guess on their own is
a better long-term approach because the Postgres out-of-core ecosystem
is expanding a lot (aka have also in-core documentation for hooks,
even if there's been a lot of reluctance historically about having
them).

[getting deeply off topic -- let's move this to another thread if it needs to
expand] I like reducing the need to guess. So far in this inplace update
project (this thread plus postgr.es/m/20240615223718.42.nmisch@google.com),
three patches just fix comments. Even comments carry quite a price, but I
value them. When we hand-maintain documentation of a C function in both its
header comment and another place, I get skeptical about whether hackers
(including myself) will actually keep them in sync and skeptical of the
incremental value of maintaining the second version.

Attachments:

wait-detect-invalid-custom-v0.patchtext/plain; charset=us-asciiDownload

diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 300de90..aaf9f3d 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -47,11 +47,11 @@ uint32	   *my_wait_event_info = &local_my_wait_event_info;
  * Hash tables for storing custom wait event ids and their names in
  * shared memory.
  *
- * WaitEventCustomHashById is used to find the name from an event id.
- * Any backend can search it to find custom wait events.
+ * WaitEventCustomHashByInfo is used to find the name from wait event
+ * information.  Any backend can search it to find custom wait events.
  *
- * WaitEventCustomHashByName is used to find the ID from a name.
- * It is used to ensure that no duplicated entries are registered.
+ * WaitEventCustomHashByName is used to find the wait event information from a
+ * name.  It is used to ensure that no duplicated entries are registered.
  *
  * For simplicity, we use the same ID counter across types of custom events.
  * We could end that anytime the need arises.
@@ -61,18 +61,18 @@ uint32	   *my_wait_event_info = &local_my_wait_event_info;
  * unlikely that the number of entries will reach
  * WAIT_EVENT_CUSTOM_HASH_MAX_SIZE.
  */
-static HTAB *WaitEventCustomHashById;	/* find names from IDs */
-static HTAB *WaitEventCustomHashByName; /* find IDs from names */
+static HTAB *WaitEventCustomHashByInfo; /* find names from infos */
+static HTAB *WaitEventCustomHashByName; /* find infos from names */
 
 #define WAIT_EVENT_CUSTOM_HASH_INIT_SIZE	16
 #define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE	128
 
 /* hash table entries */
-typedef struct WaitEventCustomEntryById
+typedef struct WaitEventCustomEntryByInfo
 {
-	uint16		event_id;		/* hash key */
+	uint32		wait_event_info;	/* hash key */
 	char		wait_event_name[NAMEDATALEN];	/* custom wait event name */
-} WaitEventCustomEntryById;
+} WaitEventCustomEntryByInfo;
 
 typedef struct WaitEventCustomEntryByName
 {
@@ -95,7 +95,7 @@ static WaitEventCustomCounterData *WaitEventCustomCounter;
 #define WAIT_EVENT_CUSTOM_INITIAL_ID	1
 
 static uint32 WaitEventCustomNew(uint32 classId, const char *wait_event_name);
-static const char *GetWaitEventCustomIdentifier(uint16 eventId);
+static const char *GetWaitEventCustomIdentifier(uint32 wait_event_info);
 
 /*
  *  Return the space for dynamic shared hash tables and dynamic allocation counter.
@@ -107,7 +107,7 @@ WaitEventCustomShmemSize(void)
 
 	sz = MAXALIGN(sizeof(WaitEventCustomCounterData));
 	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
-										 sizeof(WaitEventCustomEntryById)));
+										 sizeof(WaitEventCustomEntryByInfo)));
 	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
 										 sizeof(WaitEventCustomEntryByName)));
 	return sz;
@@ -134,13 +134,13 @@ WaitEventCustomShmemInit(void)
 	}
 
 	/* initialize or attach the hash tables to store custom wait events */
-	info.keysize = sizeof(uint16);
-	info.entrysize = sizeof(WaitEventCustomEntryById);
-	WaitEventCustomHashById = ShmemInitHash("WaitEventCustom hash by id",
-											WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
-											WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
-											&info,
-											HASH_ELEM | HASH_BLOBS);
+	info.keysize = sizeof(uint32);
+	info.entrysize = sizeof(WaitEventCustomEntryByInfo);
+	WaitEventCustomHashByInfo = ShmemInitHash("WaitEventCustom hash by wait event information",
+											  WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
+											  WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+											  &info,
+											  HASH_ELEM | HASH_BLOBS);
 
 	/* key is a NULL-terminated string */
 	info.keysize = sizeof(char[NAMEDATALEN]);
@@ -176,7 +176,7 @@ WaitEventCustomNew(uint32 classId, const char *wait_event_name)
 	uint16		eventId;
 	bool		found;
 	WaitEventCustomEntryByName *entry_by_name;
-	WaitEventCustomEntryById *entry_by_id;
+	WaitEventCustomEntryByInfo *entry_by_info;
 	uint32		wait_event_info;
 
 	/* Check the limit of the length of the event name */
@@ -249,18 +249,18 @@ WaitEventCustomNew(uint32 classId, const char *wait_event_name)
 	SpinLockRelease(&WaitEventCustomCounter->mutex);
 
 	/* Register the new wait event */
-	entry_by_id = (WaitEventCustomEntryById *)
-		hash_search(WaitEventCustomHashById, &eventId,
+	wait_event_info = classId | eventId;
+	entry_by_info = (WaitEventCustomEntryByInfo *)
+		hash_search(WaitEventCustomHashByInfo, &wait_event_info,
 					HASH_ENTER, &found);
 	Assert(!found);
-	strlcpy(entry_by_id->wait_event_name, wait_event_name,
-			sizeof(entry_by_id->wait_event_name));
+	strlcpy(entry_by_info->wait_event_name, wait_event_name,
+			sizeof(entry_by_info->wait_event_name));
 
 	entry_by_name = (WaitEventCustomEntryByName *)
 		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_ENTER, &found);
 	Assert(!found);
-	wait_event_info = classId | eventId;
 	entry_by_name->wait_event_info = wait_event_info;
 
 	LWLockRelease(WaitEventCustomLock);
@@ -269,28 +269,29 @@ WaitEventCustomNew(uint32 classId, const char *wait_event_name)
 }
 
 /*
- * Return the name of a custom wait event ID.
+ * Return the name of a custom wait event information.
  */
 static const char *
-GetWaitEventCustomIdentifier(uint16 eventId)
+GetWaitEventCustomIdentifier(uint32 wait_event_info)
 {
 	bool		found;
-	WaitEventCustomEntryById *entry;
+	WaitEventCustomEntryByInfo *entry;
 
 	/* Built-in event? */
-	if (eventId < WAIT_EVENT_CUSTOM_INITIAL_ID)
+	if (wait_event_info == PG_WAIT_EXTENSION)
 		return "Extension";
 
 	/* It is a user-defined wait event, so lookup hash table. */
 	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
-	entry = (WaitEventCustomEntryById *)
-		hash_search(WaitEventCustomHashById, &eventId,
+	entry = (WaitEventCustomEntryByInfo *)
+		hash_search(WaitEventCustomHashByInfo, &wait_event_info,
 					HASH_FIND, &found);
 	LWLockRelease(WaitEventCustomLock);
 
 	if (!entry)
-		elog(ERROR, "could not find custom wait event name for ID %u",
-			 eventId);
+		elog(ERROR,
+			 "could not find custom name for wait event information %u",
+			 wait_event_info);
 
 	return entry->wait_event_name;
 }
@@ -449,7 +450,7 @@ pgstat_get_wait_event(uint32 wait_event_info)
 			break;
 		case PG_WAIT_EXTENSION:
 		case PG_WAIT_INJECTIONPOINT:
-			event_name = GetWaitEventCustomIdentifier(eventId);
+			event_name = GetWaitEventCustomIdentifier(wait_event_info);
 			break;
 		case PG_WAIT_BUFFERPIN:
 			{
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5696604..75433b3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3100,7 +3100,7 @@ WaitEventActivity
 WaitEventBufferPin
 WaitEventClient
 WaitEventCustomCounterData
-WaitEventCustomEntryById
+WaitEventCustomEntryByInfo
 WaitEventCustomEntryByName
 WaitEventIO
 WaitEventIPC

#38

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#37)

Re: race condition in pg_class

On Sun, Jun 16, 2024 at 07:07:08AM -0700, Noah Misch wrote:

It would be odd to detect exactly 0x0B000000U and not other invalid inputs,
like 0x0A000001U where only 0x0B000001U is valid. I'm attaching roughly what
it would take. Shall I squash this into inplace031?

Agreed that merging both together is cleaner. Moving the event class
into the key of WaitEventCustomEntryByInfo leads to a more consistent
final result.

The thing I feel strongly about here is keeping focus on fixing $SUBJECT bugs
that are actually corrupting data out there.

Agreed to focus on that first.
--
Michael

#39

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#33)

2 attachment(s)

Re: race condition in pg_class

On Thu, Jun 13, 2024 at 05:35:49PM -0700, Noah Misch wrote:

On Mon, Jun 10, 2024 at 07:45:25PM -0700, Noah Misch wrote:

On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote:

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category

I've replied on that branch of the thread.

I think the attached covers all comments to date. I gave everything v3, but
most patches have just a no-conflict rebase vs. v2. The exceptions are
inplace031-inj-wait-event (implements the holding from that branch of the
thread) and inplace050-tests-inj (updated to cooperate with inplace031). Much
of inplace031-inj-wait-event is essentially s/Extension/Custom/ for the
infrastructure common to the two custom wait event types.

Starting 2024-06-27, I'd like to push
inplace080-catcache-detoast-inplace-stale and earlier patches, self-certifying
them if needed. Then I'll submit the last three to the commitfest. Does
anyone want me to delay that step?

Two more test-related changes compared to v3:

- In inplace010-tests, add to 027_stream_regress.pl a test that catalog
contents match between primary and standby. If one of these patches broke
replay of inplace updates, this would help catch it.

- In inplace031-inj-wait-event, make sysviews.sql indifferent to whether
InjectionPoint wait events exist. installcheck need this if other activity
created such an event since the last postmaster restart.

Attachments:

inplace010-tests-v4.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Improve test coverage for changes to inplace-updated catalogs.
    
    This covers both regular and inplace changes, since bugs arise at their
    intersection.  Where marked, these witness extant bugs.  Back-patch to
    v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 59ea538..956e290 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -68,6 +68,34 @@ $node->pgbench(
 		  "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE pg_temp.e;"
 	});
 
+# Test inplace updates from VACUUM concurrent with heap_update from GRANT.
+# The PROC_IN_VACUUM environment can't finish MVCC table scans consistently,
+# so this fails rarely.  To reproduce consistently, add a sleep after
+# GetCatalogSnapshot(non-catalog-rel).
+Test::More->builder->todo_start('PROC_IN_VACUUM scan breakage');
+$node->safe_psql('postgres', 'CREATE TABLE ddl_target ()');
+$node->pgbench(
+	'--no-vacuum --client=5 --protocol=prepared --transactions=50',
+	0,
+	[qr{processed: 250/250}],
+	[qr{^$}],
+	'concurrent GRANT/VACUUM',
+	{
+		'001_pgbench_grant@9' => q(
+			DO $$
+			BEGIN
+				PERFORM pg_advisory_xact_lock(42);
+				FOR i IN 1 .. 10 LOOP
+					GRANT SELECT ON ddl_target TO PUBLIC;
+					REVOKE SELECT ON ddl_target FROM PUBLIC;
+				END LOOP;
+			END
+			$$;
+),
+		'001_pgbench_vacuum_ddl_target@1' => "VACUUM ddl_target;",
+	});
+Test::More->builder->todo_end;
+
 # Trigger various connection errors
 $node->pgbench(
 	'no-such-database',
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 0237271..032d420 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -1337,3 +1337,29 @@ a|b|c|   d
 2|2|2|1004
 (2 rows)
 
+
+starting permutation: sys1 sysupd2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysupd2: 
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+ <waiting ...>
+step c1: COMMIT;
+step sysupd2: <... completed>
+step c2: COMMIT;
+
+starting permutation: sys1 sysmerge2 c1 c2
+step sys1: 
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+
+step sysmerge2: 
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+ <waiting ...>
+step c1: COMMIT;
+step sysmerge2: <... completed>
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/inplace-inval.out b/src/test/isolation/expected/inplace-inval.out
new file mode 100644
index 0000000..67b34ad
--- /dev/null
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -0,0 +1,32 @@
+Parsed test spec with 3 sessions
+
+starting permutation: cachefill3 cir1 cic2 ddl3 read1
+step cachefill3: TABLE newly_indexed;
+c
+-
+(0 rows)
+
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: cir1 cic2 ddl3 read1
+step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
+step cic2: CREATE INDEX i2 ON newly_indexed (c);
+step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+t          
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
new file mode 100644
index 0000000..432ece5
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -0,0 +1,28 @@
+Parsed test spec with 3 sessions
+
+starting permutation: snap3 b1 grant1 vac2 snap3 c1 cmp3
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step b1: BEGIN;
+step grant1: 
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+
+step vac2: VACUUM (FREEZE);
+step snap3: 
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+
+step c1: COMMIT;
+step cmp3: 
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+
+?column?              
+----------------------
+datfrozenxid retreated
+(1 row)
+
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..cc1e47a
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,225 @@
+Parsed test spec with 5 sessions
+
+starting permutation: b1 grant1 read2 addk2 c1 read2
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: keyshr5 addk2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+
+starting permutation: keyshr5 b3 sfnku3 addk2 r3
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+
+starting permutation: b2 sfnku2 addk2 c2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: keyshr5 b2 sfnku2 addk2 c2
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step r3: ROLLBACK;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b2 sfnku2 b1 grant1 addk2 c2 c1 read2
+step b2: BEGIN;
+step sfnku2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+ <waiting ...>
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step c2: COMMIT;
+step grant1: <... completed>
+step c1: COMMIT;
+step read2: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+
+starting permutation: b1 grant1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step grant1: 
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+f          
+(1 row)
+
+s4: WARNING:  got: tuple concurrently updated
+step revoke4: <... completed>
+step r3: ROLLBACK;
+
+starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
+step b1: BEGIN;
+step drop1: 
+	DROP TABLE intra_grant_inplace;
+
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfu3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+ <waiting ...>
+step revoke4: 
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+ <waiting ...>
+step c1: COMMIT;
+step sfu3: <... completed>
+relhasindex
+-----------
+(0 rows)
+
+s4: WARNING:  got: tuple concurrently deleted
+step revoke4: <... completed>
+step r3: ROLLBACK;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0342eb3..6da98cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,9 @@ test: fk-snapshot
 test: subxid-overflow
 test: eval-plan-qual
 test: eval-plan-qual-trigger
+test: inplace-inval
+test: intra-grant-inplace
+test: intra-grant-inplace-db
 test: lock-update-delete
 test: lock-update-traversal
 test: inherit-temp
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index edd6d19..3a74406 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,6 +194,12 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
+# test system class updates
+
+step sys1	{
+	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
+}
+
 
 session s2
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
@@ -282,6 +288,18 @@ step wnested2 {
     );
 }
 
+step sysupd2	{
+	UPDATE pg_class SET reltuples = reltuples * 2
+	WHERE oid = 'accounts'::regclass;
+}
+
+step sysmerge2	{
+	MERGE INTO pg_class
+	USING (SELECT 'accounts'::regclass AS o) j
+	ON o = oid
+	WHEN MATCHED THEN UPDATE SET reltuples = reltuples * 2;
+}
+
 step c2	{ COMMIT; }
 step r2	{ ROLLBACK; }
 
@@ -380,3 +398,6 @@ permutation simplepartupdate complexpartupdate c1 c2 read_part
 permutation simplepartupdate_route1to2 complexpartupdate_route_err1 c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_route c1 c2 read_part
 permutation simplepartupdate_noroute complexpartupdate_doesnt_route c1 c2 read_part
+
+permutation sys1 sysupd2 c1 c2
+permutation sys1 sysmerge2 c1 c2
diff --git a/src/test/isolation/specs/inplace-inval.spec b/src/test/isolation/specs/inplace-inval.spec
new file mode 100644
index 0000000..d8e1c98
--- /dev/null
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -0,0 +1,38 @@
+# If a heap_update() caller retrieves its oldtup from a cache, it's possible
+# for that cache entry to predate an inplace update, causing loss of that
+# inplace update.  This arises because the transaction may abort before
+# sending the inplace invalidation message to the shared queue.
+
+setup
+{
+	CREATE TABLE newly_indexed (c int);
+}
+
+teardown
+{
+	DROP TABLE newly_indexed;
+}
+
+session s1
+step cir1	{ BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK; }
+step read1	{
+	SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+}
+
+session s2
+step cic2	{ CREATE INDEX i2 ON newly_indexed (c); }
+
+session s3
+step cachefill3	{ TABLE newly_indexed; }
+step ddl3		{ ALTER TABLE newly_indexed ADD extra int; }
+
+
+permutation
+	cachefill3	# populates the pg_class row in the catcache
+	cir1	# sets relhasindex=true; rollback discards cache inval
+	cic2	# sees relhasindex=true, skips changing it (so no inval)
+	ddl3	# cached row as the oldtup of an update, losing relhasindex
+	read1	# observe damage XXX is an extant bug
+
+# without cachefill3, no bug
+permutation cir1 cic2 ddl3 read1
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
new file mode 100644
index 0000000..bbecd5d
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -0,0 +1,46 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, namely
+# datfrozenxid, need special code to cope.
+
+setup
+{
+	CREATE ROLE regress_temp_grantee;
+}
+
+teardown
+{
+	REVOKE ALL ON DATABASE isolation_regression FROM regress_temp_grantee;
+	DROP ROLE regress_temp_grantee;
+}
+
+# heap_update(pg_database)
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step vac2	{ VACUUM (FREEZE); }
+
+# observe datfrozenxid
+session s3
+setup	{
+	CREATE TEMP TABLE frozen_witness (x xid);
+}
+step snap3	{
+	INSERT INTO frozen_witness
+	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
+}
+step cmp3	{
+	SELECT 'datfrozenxid retreated'
+	FROM pg_database
+	WHERE datname = current_catalog
+		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
+}
+
+
+# XXX extant bug
+permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..3cd696b
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,153 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  Inplace updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+	CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+	DROP TABLE IF EXISTS intra_grant_inplace;
+}
+
+# heap_update()
+session s1
+step b1	{ BEGIN; }
+step grant1	{
+	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+}
+step drop1	{
+	DROP TABLE intra_grant_inplace;
+}
+step c1	{ COMMIT; }
+
+# inplace update
+session s2
+step read2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass;
+}
+step b2		{ BEGIN; }
+step addk2	{ ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+step sfnku2	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step c2		{ COMMIT; }
+
+# rowmarks
+session s3
+step b3		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step sfnku3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+}
+step sfu3	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
+}
+step r3	{ ROLLBACK; }
+
+# Additional heap_update()
+session s4
+# swallow error message to keep any OID value out of expected output
+step revoke4	{
+	DO $$
+	BEGIN
+		REVOKE SELECT ON intra_grant_inplace FROM PUBLIC;
+	EXCEPTION WHEN others THEN
+		RAISE WARNING 'got: %', regexp_replace(sqlerrm, '[0-9]+', 'REDACTED');
+	END
+	$$;
+}
+
+# Additional rowmarks
+session s5
+setup	{ BEGIN; }
+step keyshr5	{
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+}
+teardown	{ ROLLBACK; }
+
+
+# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+
+permutation
+	b1
+	grant1
+	read2
+	addk2(c1)	# inplace waits
+	c1
+	read2
+
+# inplace thru KEY SHARE
+permutation
+	keyshr5
+	addk2
+
+# inplace wait NO KEY UPDATE w/ KEY SHARE
+permutation
+	keyshr5
+	b3
+	sfnku3
+	addk2(r3)
+	r3
+
+# same-xact rowmark
+permutation
+	b2
+	sfnku2
+	addk2
+	c2
+
+# same-xact rowmark in multixact
+permutation
+	keyshr5
+	b2
+	sfnku2
+	addk2
+	c2
+
+permutation
+	b3
+	sfu3
+	b1
+	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
+	read2
+	addk2(c1)	# block in LockTuple() behind grant1
+	r3			# unblock grant1; addk2 now awaits grant1 xmax
+	c1
+	read2
+
+permutation
+	b2
+	sfnku2
+	b1
+	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
+	addk2			# block in LockTuple() behind grant1 = deadlock
+	c2
+	c1
+	read2
+
+# SearchSysCacheLocked1() calling LockRelease()
+permutation
+	b1
+	grant1
+	b3
+	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1
+	r3			# revoke4 unlocks old tuple and finds new
+
+# SearchSysCacheLocked1() finding a tuple, then no tuple
+permutation
+	b1
+	drop1
+	b3
+	sfu3(c1)		# acquire LockTuple(), await drop1 xmax
+	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	c1				# sfu3 locks none; revoke4 unlocks old and finds none
+	r3
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index ae8dea0..d1ae32d 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -120,6 +120,40 @@ command_ok(
 	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
 	'compare primary and standby dumps');
 
+# Likewise for the catalogs of the regression database, after disabling
+# autovacuum to make fields like relpages stop changing.
+$node_primary->append_conf('postgresql.conf', 'autovacuum = off');
+$node_primary->restart;
+$node_primary->wait_for_replay_catchup($node_standby_1);
+command_ok(
+	[
+		'pg_dump',
+		('--schema', 'pg_catalog'),
+		('-f', $outputdir . '/catalogs_primary.dump'),
+		'--no-sync',
+		('-p', $node_primary->port),
+		'--no-unlogged-table-data',
+		'regression'
+	],
+	'dump catalogs of primary server');
+command_ok(
+	[
+		'pg_dump',
+		('--schema', 'pg_catalog'),
+		('-f', $outputdir . '/catalogs_standby.dump'),
+		'--no-sync',
+		('-p', $node_standby_1->port),
+		'regression'
+	],
+	'dump catalogs of standby server');
+command_ok(
+	[
+		'diff',
+		$outputdir . '/catalogs_primary.dump',
+		$outputdir . '/catalogs_standby.dump'
+	],
+	'compare primary and standby catalog dumps');
+
 # Check some data from pg_stat_statements.
 $node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements');
 # This gathers data based on the first characters for some common query types,
diff --git a/src/test/regress/expected/database.out b/src/test/regress/expected/database.out
new file mode 100644
index 0000000..30c0865
--- /dev/null
+++ b/src/test/regress/expected/database.out
@@ -0,0 +1,14 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index eddc1f4..3d33259 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -2691,6 +2691,30 @@ drop cascades to table measurement_y2007m01
 DROP FUNCTION measurement_insert_trigger();
 -- prepare
 RESET SESSION AUTHORIZATION;
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+    oid    
+-----------
+ pg_depend
+(1 row)
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+ oid  
+------
+ 2608
+(1 row)
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 969ced9..2429ec2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -28,7 +28,7 @@ test: strings md5 numerology point lseg line box path polygon circle date time t
 # geometry depends on point, lseg, line, box, path, polygon, circle
 # horology depends on date, time, timetz, timestamp, timestamptz, interval
 # ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc
+test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database
 
 # ----------
 # Load huge amounts of data
diff --git a/src/test/regress/sql/database.sql b/src/test/regress/sql/database.sql
new file mode 100644
index 0000000..6c61f2e
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,16 @@
+CREATE DATABASE regression_tbd ENCODING utf8 LOCALE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+
+-- Test PgDatabaseToastTable.  Doing this organically and portably would take
+-- a huge relacl, which would be slow.
+BEGIN;
+UPDATE pg_database SET datcollversion = repeat('a', 6e6::int)
+WHERE datname = 'regression_utf8';
+-- load catcache entry, if nothing else does
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ROLLBACK;
+
+DROP DATABASE regression_utf8;
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
index 3d5d854..92163ec 100644
--- a/src/test/regress/sql/merge.sql
+++ b/src/test/regress/sql/merge.sql
@@ -1713,6 +1713,23 @@ DROP FUNCTION measurement_insert_trigger();
 -- prepare
 
 RESET SESSION AUTHORIZATION;
+
+-- try a system catalog
+MERGE INTO pg_class c
+USING (SELECT 'pg_depend'::regclass AS oid) AS j
+ON j.oid = c.oid
+WHEN MATCHED THEN
+	UPDATE SET reltuples = reltuples + 1
+RETURNING j.oid;
+
+CREATE VIEW classv AS SELECT * FROM pg_class;
+MERGE INTO classv c
+USING pg_namespace n
+ON n.oid = c.relnamespace
+WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
+	UPDATE SET reltuples = reltuples - 1
+RETURNING c.oid;
+
 DROP TABLE target, target2;
 DROP TABLE source, source2;
 DROP FUNCTION merge_trigfunc();

inplace031-inj-wait-event-v4.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Add wait event type "InjectionPoint", a custom type like "Extension".
    
    Both injection points and customization of type "Extension" are new in
    v17, so this just changes a detail of an unreleased feature.
    
    Reported by Robert Haas.  Reviewed by Michael Paquier.
    
    Discussion: https://postgr.es/m/CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 053da8d..8233f98 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1064,6 +1064,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
      <row>
+      <entry><literal>InjectionPoint</literal></entry>
+      <entry>The server process is waiting for an injection point to reach an
+       outcome defined in a test.  See
+       <xref linkend="xfunc-addin-injection-points"/> for more details.  This
+       type has no predefined wait points.
+      </entry>
+     </row>
+     <row>
       <entry><literal>IO</literal></entry>
       <entry>The server process is waiting for an I/O operation to complete.
        <literal>wait_event</literal> will identify the specific wait point;
@@ -1139,8 +1147,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
    <note>
     <para>
-     Extensions can add <literal>Extension</literal> and
-     <literal>LWLock</literal> events
+     Extensions can add <literal>Extension</literal>,
+     <literal>InjectionPoint</literal>. and <literal>LWLock</literal> events
      to the lists shown in <xref linkend="wait-event-extension-table"/> and
      <xref linkend="wait-event-lwlock-table"/>. In some cases, the name
      of an <literal>LWLock</literal> assigned by an extension will not be
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index a7c1704..66c1c30 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3643,7 +3643,11 @@ extern void InjectionPointAttach(const char *name,
 static void
 custom_injection_callback(const char *name, const void *private_data)
 {
+    uint32 wait_event_info = WaitEventInjectionPointNew(name);
+
+    pgstat_report_wait_start(wait_event_info);
     elog(NOTICE, "%s: executed custom callback", name);
+    pgstat_report_wait_end();
 }
 </programlisting>
      This callback prints a message to server error log with severity
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 521ed54..2100150 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -149,7 +149,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, SyncScanShmemSize());
 	size = add_size(size, AsyncShmemSize());
 	size = add_size(size, StatsShmemSize());
-	size = add_size(size, WaitEventExtensionShmemSize());
+	size = add_size(size, WaitEventCustomShmemSize());
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 #ifdef EXEC_BACKEND
@@ -355,7 +355,7 @@ CreateOrAttachShmemStructs(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	StatsShmemInit();
-	WaitEventExtensionShmemInit();
+	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
 }
 
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 42f36f4..6a9c0a5 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -181,9 +181,10 @@ if ($gen_code)
 	foreach my $waitclass (sort { uc($a) cmp uc($b) } keys %hashwe)
 	{
 		# Don't generate the pgstat_wait_event.c and wait_event_types.h files
-		# for Extension, LWLock and Lock, these are handled independently.
+		# for types handled independently.
 		next
 		  if ( $waitclass eq 'WaitEventExtension'
+			|| $waitclass eq 'WaitEventInjectionPoint'
 			|| $waitclass eq 'WaitEventLWLock'
 			|| $waitclass eq 'WaitEventLock');
 
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 084a9df..bbf5948 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -47,68 +47,69 @@ uint32	   *my_wait_event_info = &local_my_wait_event_info;
  * Hash tables for storing custom wait event ids and their names in
  * shared memory.
  *
- * WaitEventExtensionHashById is used to find the name from an event id.
- * Any backend can search it to find custom wait events.
+ * WaitEventCustomHashByInfo is used to find the name from wait event
+ * information.  Any backend can search it to find custom wait events.
  *
- * WaitEventExtensionHashByName is used to find the event ID from a name.
- * It is used to ensure that no duplicated entries are registered.
+ * WaitEventCustomHashByName is used to find the wait event information from a
+ * name.  It is used to ensure that no duplicated entries are registered.
+ *
+ * For simplicity, we use the same ID counter across types of custom events.
+ * We could end that anytime the need arises.
  *
  * The size of the hash table is based on the assumption that
- * WAIT_EVENT_EXTENSION_HASH_INIT_SIZE is enough for most cases, and it seems
+ * WAIT_EVENT_CUSTOM_HASH_INIT_SIZE is enough for most cases, and it seems
  * unlikely that the number of entries will reach
- * WAIT_EVENT_EXTENSION_HASH_MAX_SIZE.
+ * WAIT_EVENT_CUSTOM_HASH_MAX_SIZE.
  */
-static HTAB *WaitEventExtensionHashById;	/* find names from IDs */
-static HTAB *WaitEventExtensionHashByName;	/* find IDs from names */
+static HTAB *WaitEventCustomHashByInfo; /* find names from infos */
+static HTAB *WaitEventCustomHashByName; /* find infos from names */
 
-#define WAIT_EVENT_EXTENSION_HASH_INIT_SIZE	16
-#define WAIT_EVENT_EXTENSION_HASH_MAX_SIZE	128
+#define WAIT_EVENT_CUSTOM_HASH_INIT_SIZE	16
+#define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE	128
 
 /* hash table entries */
-typedef struct WaitEventExtensionEntryById
+typedef struct WaitEventCustomEntryByInfo
 {
-	uint16		event_id;		/* hash key */
+	uint32		wait_event_info;	/* hash key */
 	char		wait_event_name[NAMEDATALEN];	/* custom wait event name */
-} WaitEventExtensionEntryById;
+} WaitEventCustomEntryByInfo;
 
-typedef struct WaitEventExtensionEntryByName
+typedef struct WaitEventCustomEntryByName
 {
 	char		wait_event_name[NAMEDATALEN];	/* hash key */
-	uint16		event_id;		/* wait event ID */
-} WaitEventExtensionEntryByName;
+	uint32		wait_event_info;
+} WaitEventCustomEntryByName;
 
 
-/* dynamic allocation counter for custom wait events in extensions */
-typedef struct WaitEventExtensionCounterData
+/* dynamic allocation counter for custom wait events */
+typedef struct WaitEventCustomCounterData
 {
 	int			nextId;			/* next ID to assign */
 	slock_t		mutex;			/* protects the counter */
-} WaitEventExtensionCounterData;
+} WaitEventCustomCounterData;
 
 /* pointer to the shared memory */
-static WaitEventExtensionCounterData *WaitEventExtensionCounter;
+static WaitEventCustomCounterData *WaitEventCustomCounter;
 
-/* first event ID of custom wait events for extensions */
-#define WAIT_EVENT_EXTENSION_INITIAL_ID	1
+/* first event ID of custom wait events */
+#define WAIT_EVENT_CUSTOM_INITIAL_ID	1
 
-/* wait event info for extensions */
-#define WAIT_EVENT_EXTENSION_INFO(eventId)	(PG_WAIT_EXTENSION | eventId)
-
-static const char *GetWaitEventExtensionIdentifier(uint16 eventId);
+static uint32 WaitEventCustomNew(uint32 classId, const char *wait_event_name);
+static const char *GetWaitEventCustomIdentifier(uint32 wait_event_info);
 
 /*
  *  Return the space for dynamic shared hash tables and dynamic allocation counter.
  */
 Size
-WaitEventExtensionShmemSize(void)
+WaitEventCustomShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(WaitEventExtensionCounterData));
-	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-										 sizeof(WaitEventExtensionEntryById)));
-	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-										 sizeof(WaitEventExtensionEntryByName)));
+	sz = MAXALIGN(sizeof(WaitEventCustomCounterData));
+	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+										 sizeof(WaitEventCustomEntryByInfo)));
+	sz = add_size(sz, hash_estimate_size(WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+										 sizeof(WaitEventCustomEntryByName)));
 	return sz;
 }
 
@@ -116,39 +117,41 @@ WaitEventExtensionShmemSize(void)
  * Allocate shmem space for dynamic shared hash and dynamic allocation counter.
  */
 void
-WaitEventExtensionShmemInit(void)
+WaitEventCustomShmemInit(void)
 {
 	bool		found;
 	HASHCTL		info;
 
-	WaitEventExtensionCounter = (WaitEventExtensionCounterData *)
-		ShmemInitStruct("WaitEventExtensionCounterData",
-						sizeof(WaitEventExtensionCounterData), &found);
+	WaitEventCustomCounter = (WaitEventCustomCounterData *)
+		ShmemInitStruct("WaitEventCustomCounterData",
+						sizeof(WaitEventCustomCounterData), &found);
 
 	if (!found)
 	{
 		/* initialize the allocation counter and its spinlock. */
-		WaitEventExtensionCounter->nextId = WAIT_EVENT_EXTENSION_INITIAL_ID;
-		SpinLockInit(&WaitEventExtensionCounter->mutex);
+		WaitEventCustomCounter->nextId = WAIT_EVENT_CUSTOM_INITIAL_ID;
+		SpinLockInit(&WaitEventCustomCounter->mutex);
 	}
 
 	/* initialize or attach the hash tables to store custom wait events */
-	info.keysize = sizeof(uint16);
-	info.entrysize = sizeof(WaitEventExtensionEntryById);
-	WaitEventExtensionHashById = ShmemInitHash("WaitEventExtension hash by id",
-											   WAIT_EVENT_EXTENSION_HASH_INIT_SIZE,
-											   WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-											   &info,
-											   HASH_ELEM | HASH_BLOBS);
+	info.keysize = sizeof(uint32);
+	info.entrysize = sizeof(WaitEventCustomEntryByInfo);
+	WaitEventCustomHashByInfo =
+		ShmemInitHash("WaitEventCustom hash by wait event information",
+					  WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
+					  WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+					  &info,
+					  HASH_ELEM | HASH_BLOBS);
 
 	/* key is a NULL-terminated string */
 	info.keysize = sizeof(char[NAMEDATALEN]);
-	info.entrysize = sizeof(WaitEventExtensionEntryByName);
-	WaitEventExtensionHashByName = ShmemInitHash("WaitEventExtension hash by name",
-												 WAIT_EVENT_EXTENSION_HASH_INIT_SIZE,
-												 WAIT_EVENT_EXTENSION_HASH_MAX_SIZE,
-												 &info,
-												 HASH_ELEM | HASH_STRINGS);
+	info.entrysize = sizeof(WaitEventCustomEntryByName);
+	WaitEventCustomHashByName =
+		ShmemInitHash("WaitEventCustom hash by name",
+					  WAIT_EVENT_CUSTOM_HASH_INIT_SIZE,
+					  WAIT_EVENT_CUSTOM_HASH_MAX_SIZE,
+					  &info,
+					  HASH_ELEM | HASH_STRINGS);
 }
 
 /*
@@ -160,10 +163,23 @@ WaitEventExtensionShmemInit(void)
 uint32
 WaitEventExtensionNew(const char *wait_event_name)
 {
+	return WaitEventCustomNew(PG_WAIT_EXTENSION, wait_event_name);
+}
+
+uint32
+WaitEventInjectionPointNew(const char *wait_event_name)
+{
+	return WaitEventCustomNew(PG_WAIT_INJECTIONPOINT, wait_event_name);
+}
+
+static uint32
+WaitEventCustomNew(uint32 classId, const char *wait_event_name)
+{
 	uint16		eventId;
 	bool		found;
-	WaitEventExtensionEntryByName *entry_by_name;
-	WaitEventExtensionEntryById *entry_by_id;
+	WaitEventCustomEntryByName *entry_by_name;
+	WaitEventCustomEntryByInfo *entry_by_info;
+	uint32		wait_event_info;
 
 	/* Check the limit of the length of the event name */
 	if (strlen(wait_event_name) >= NAMEDATALEN)
@@ -175,13 +191,24 @@ WaitEventExtensionNew(const char *wait_event_name)
 	 * Check if the wait event info associated to the name is already defined,
 	 * and return it if so.
 	 */
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_FIND, &found);
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 	if (found)
-		return WAIT_EVENT_EXTENSION_INFO(entry_by_name->event_id);
+	{
+		uint32		oldClassId;
+
+		oldClassId = entry_by_name->wait_event_info & WAIT_EVENT_CLASS_MASK;
+		if (oldClassId != classId)
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("wait event \"%s\" already exists in type \"%s\"",
+							wait_event_name,
+							pgstat_get_wait_event_type(entry_by_name->wait_event_info))));
+		return entry_by_name->wait_event_info;
+	}
 
 	/*
 	 * Allocate and register a new wait event.  Recheck if the event name
@@ -189,113 +216,123 @@ WaitEventExtensionNew(const char *wait_event_name)
 	 * one with the same name since the LWLock acquired again here was
 	 * previously released.
 	 */
-	LWLockAcquire(WaitEventExtensionLock, LW_EXCLUSIVE);
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	LWLockAcquire(WaitEventCustomLock, LW_EXCLUSIVE);
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_FIND, &found);
 	if (found)
 	{
-		LWLockRelease(WaitEventExtensionLock);
-		return WAIT_EVENT_EXTENSION_INFO(entry_by_name->event_id);
+		uint32		oldClassId;
+
+		LWLockRelease(WaitEventCustomLock);
+		oldClassId = entry_by_name->wait_event_info & WAIT_EVENT_CLASS_MASK;
+		if (oldClassId != classId)
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("wait event \"%s\" already exists in type \"%s\"",
+							wait_event_name,
+							pgstat_get_wait_event_type(entry_by_name->wait_event_info))));
+		return entry_by_name->wait_event_info;
 	}
 
 	/* Allocate a new event Id */
-	SpinLockAcquire(&WaitEventExtensionCounter->mutex);
+	SpinLockAcquire(&WaitEventCustomCounter->mutex);
 
-	if (WaitEventExtensionCounter->nextId >= WAIT_EVENT_EXTENSION_HASH_MAX_SIZE)
+	if (WaitEventCustomCounter->nextId >= WAIT_EVENT_CUSTOM_HASH_MAX_SIZE)
 	{
-		SpinLockRelease(&WaitEventExtensionCounter->mutex);
+		SpinLockRelease(&WaitEventCustomCounter->mutex);
 		ereport(ERROR,
 				errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-				errmsg("too many wait events for extensions"));
+				errmsg("too many custom wait events"));
 	}
 
-	eventId = WaitEventExtensionCounter->nextId++;
+	eventId = WaitEventCustomCounter->nextId++;
 
-	SpinLockRelease(&WaitEventExtensionCounter->mutex);
+	SpinLockRelease(&WaitEventCustomCounter->mutex);
 
 	/* Register the new wait event */
-	entry_by_id = (WaitEventExtensionEntryById *)
-		hash_search(WaitEventExtensionHashById, &eventId,
+	wait_event_info = classId | eventId;
+	entry_by_info = (WaitEventCustomEntryByInfo *)
+		hash_search(WaitEventCustomHashByInfo, &wait_event_info,
 					HASH_ENTER, &found);
 	Assert(!found);
-	strlcpy(entry_by_id->wait_event_name, wait_event_name,
-			sizeof(entry_by_id->wait_event_name));
+	strlcpy(entry_by_info->wait_event_name, wait_event_name,
+			sizeof(entry_by_info->wait_event_name));
 
-	entry_by_name = (WaitEventExtensionEntryByName *)
-		hash_search(WaitEventExtensionHashByName, wait_event_name,
+	entry_by_name = (WaitEventCustomEntryByName *)
+		hash_search(WaitEventCustomHashByName, wait_event_name,
 					HASH_ENTER, &found);
 	Assert(!found);
-	entry_by_name->event_id = eventId;
+	entry_by_name->wait_event_info = wait_event_info;
 
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 
-	return WAIT_EVENT_EXTENSION_INFO(eventId);
+	return wait_event_info;
 }
 
 /*
- * Return the name of an wait event ID for extension.
+ * Return the name of a custom wait event information.
  */
 static const char *
-GetWaitEventExtensionIdentifier(uint16 eventId)
+GetWaitEventCustomIdentifier(uint32 wait_event_info)
 {
 	bool		found;
-	WaitEventExtensionEntryById *entry;
+	WaitEventCustomEntryByInfo *entry;
 
 	/* Built-in event? */
-	if (eventId < WAIT_EVENT_EXTENSION_INITIAL_ID)
+	if (wait_event_info == PG_WAIT_EXTENSION)
 		return "Extension";
 
 	/* It is a user-defined wait event, so lookup hash table. */
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
-	entry = (WaitEventExtensionEntryById *)
-		hash_search(WaitEventExtensionHashById, &eventId,
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
+	entry = (WaitEventCustomEntryByInfo *)
+		hash_search(WaitEventCustomHashByInfo, &wait_event_info,
 					HASH_FIND, &found);
-	LWLockRelease(WaitEventExtensionLock);
+	LWLockRelease(WaitEventCustomLock);
 
 	if (!entry)
-		elog(ERROR, "could not find custom wait event name for ID %u",
-			 eventId);
+		elog(ERROR,
+			 "could not find custom name for wait event information %u",
+			 wait_event_info);
 
 	return entry->wait_event_name;
 }
 
 
 /*
- * Returns a list of currently defined custom wait event names for extensions.
- * The result is a palloc'd array, with the number of elements saved in
- * *nwaitevents.
+ * Returns a list of currently defined custom wait event names.  The result is
+ * a palloc'd array, with the number of elements saved in *nwaitevents.
  */
 char	  **
-GetWaitEventExtensionNames(int *nwaitevents)
+GetWaitEventCustomNames(uint32 classId, int *nwaitevents)
 {
 	char	  **waiteventnames;
-	WaitEventExtensionEntryByName *hentry;
+	WaitEventCustomEntryByName *hentry;
 	HASH_SEQ_STATUS hash_seq;
 	int			index;
 	int			els;
 
-	LWLockAcquire(WaitEventExtensionLock, LW_SHARED);
+	LWLockAcquire(WaitEventCustomLock, LW_SHARED);
 
 	/* Now we can safely count the number of entries */
-	els = hash_get_num_entries(WaitEventExtensionHashByName);
+	els = hash_get_num_entries(WaitEventCustomHashByName);
 
 	/* Allocate enough space for all entries */
 	waiteventnames = palloc(els * sizeof(char *));
 
 	/* Now scan the hash table to copy the data */
-	hash_seq_init(&hash_seq, WaitEventExtensionHashByName);
+	hash_seq_init(&hash_seq, WaitEventCustomHashByName);
 
 	index = 0;
-	while ((hentry = (WaitEventExtensionEntryByName *) hash_seq_search(&hash_seq)) != NULL)
+	while ((hentry = (WaitEventCustomEntryByName *) hash_seq_search(&hash_seq)) != NULL)
 	{
+		if ((hentry->wait_event_info & WAIT_EVENT_CLASS_MASK) != classId)
+			continue;
 		waiteventnames[index] = pstrdup(hentry->wait_event_name);
 		index++;
 	}
 
-	LWLockRelease(WaitEventExtensionLock);
-
-	Assert(index == els);
+	LWLockRelease(WaitEventCustomLock);
 
 	*nwaitevents = index;
 	return waiteventnames;
@@ -374,6 +411,9 @@ pgstat_get_wait_event_type(uint32 wait_event_info)
 		case PG_WAIT_IO:
 			event_type = "IO";
 			break;
+		case PG_WAIT_INJECTIONPOINT:
+			event_type = "InjectionPoint";
+			break;
 		default:
 			event_type = "???";
 			break;
@@ -411,7 +451,8 @@ pgstat_get_wait_event(uint32 wait_event_info)
 			event_name = GetLockNameFromTagType(eventId);
 			break;
 		case PG_WAIT_EXTENSION:
-			event_name = GetWaitEventExtensionIdentifier(eventId);
+		case PG_WAIT_INJECTIONPOINT:
+			event_name = GetWaitEventCustomIdentifier(wait_event_info);
 			break;
 		case PG_WAIT_BUFFERPIN:
 			{
diff --git a/src/backend/utils/activity/wait_event_funcs.c b/src/backend/utils/activity/wait_event_funcs.c
index ba244c2..fa8bc05 100644
--- a/src/backend/utils/activity/wait_event_funcs.c
+++ b/src/backend/utils/activity/wait_event_funcs.c
@@ -48,7 +48,7 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 #define PG_GET_WAIT_EVENTS_COLS 3
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	char	  **waiteventnames;
-	int			nbextwaitevents;
+	int			nbwaitevents;
 
 	/* Build tuplestore to hold the result rows */
 	InitMaterializedSRF(fcinfo, 0);
@@ -67,9 +67,10 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 	}
 
 	/* Handle custom wait events for extensions */
-	waiteventnames = GetWaitEventExtensionNames(&nbextwaitevents);
+	waiteventnames = GetWaitEventCustomNames(PG_WAIT_EXTENSION,
+											 &nbwaitevents);
 
-	for (int idx = 0; idx < nbextwaitevents; idx++)
+	for (int idx = 0; idx < nbwaitevents; idx++)
 	{
 		StringInfoData buf;
 		Datum		values[PG_GET_WAIT_EVENTS_COLS] = {0};
@@ -89,5 +90,29 @@ pg_get_wait_events(PG_FUNCTION_ARGS)
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
 	}
 
+	/* Likewise for injection points */
+	waiteventnames = GetWaitEventCustomNames(PG_WAIT_INJECTIONPOINT,
+											 &nbwaitevents);
+
+	for (int idx = 0; idx < nbwaitevents; idx++)
+	{
+		StringInfoData buf;
+		Datum		values[PG_GET_WAIT_EVENTS_COLS] = {0};
+		bool		nulls[PG_GET_WAIT_EVENTS_COLS] = {0};
+
+
+		values[0] = CStringGetTextDatum("InjectionPoint");
+		values[1] = CStringGetTextDatum(waiteventnames[idx]);
+
+		initStringInfo(&buf);
+		appendStringInfo(&buf,
+						 "Waiting for injection point \"%s\"",
+						 waiteventnames[idx]);
+
+		values[2] = CStringGetTextDatum(buf.data);
+
+		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+	}
+
 	return (Datum) 0;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 87cbca2..db37bee 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -340,7 +340,7 @@ LogicalRepWorker	"Waiting to read or update the state of logical replication wor
 XactTruncation	"Waiting to execute <function>pg_xact_status</function> or update the oldest transaction ID available to it."
 WrapLimitsVacuum	"Waiting to update limits on transaction id and multixact consumption."
 NotifyQueueTail	"Waiting to update limit on <command>NOTIFY</command> message storage."
-WaitEventExtension	"Waiting to read or update custom wait events information for extensions."
+WaitEventCustom	"Waiting to read or update custom wait events information."
 WALSummarizer	"Waiting to read or update WAL summarization state."
 DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 85f6568..6a2f64c 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -78,7 +78,7 @@ PG_LWLOCK(44, XactTruncation)
 /* 45 was XactTruncationLock until removal of BackendRandomLock */
 PG_LWLOCK(46, WrapLimitsVacuum)
 PG_LWLOCK(47, NotifyQueueTail)
-PG_LWLOCK(48, WaitEventExtension)
+PG_LWLOCK(48, WaitEventCustom)
 PG_LWLOCK(49, WALSummarizer)
 PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 1b735d4..9f18a75 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -24,6 +24,7 @@
 #define PG_WAIT_IPC					0x08000000U
 #define PG_WAIT_TIMEOUT				0x09000000U
 #define PG_WAIT_IO					0x0A000000U
+#define PG_WAIT_INJECTIONPOINT		0x0B000000U
 
 /* enums for wait events */
 #include "utils/wait_event_types.h"
@@ -38,26 +39,28 @@ extern void pgstat_reset_wait_event_storage(void);
 extern PGDLLIMPORT uint32 *my_wait_event_info;
 
 
-/* ----------
- * Wait Events - Extension
+/*
+ * Wait Events - Extension, InjectionPoint
  *
- * Use this category when the server process is waiting for some condition
- * defined by an extension module.
+ * Use InjectionPoint when the server process is waiting in an injection
+ * point.  Use Extension for other cases of the server process waiting for
+ * some condition defined by an extension module.
  *
- * Extensions can define their own wait events in this category.  They should
- * call WaitEventExtensionNew() with a wait event string.  If the wait event
- * associated to a string is already allocated, it returns the wait event
- * information to use.  If not, it gets one wait event ID allocated from
+ * Extensions can define their own wait events in these categories.  They
+ * should call one of these functions with a wait event string.  If the wait
+ * event associated to a string is already allocated, it returns the wait
+ * event information to use.  If not, it gets one wait event ID allocated from
  * a shared counter, associates the string to the ID in the shared dynamic
  * hash and returns the wait event information.
  *
  * The ID retrieved can be used with pgstat_report_wait_start() or equivalent.
  */
-extern void WaitEventExtensionShmemInit(void);
-extern Size WaitEventExtensionShmemSize(void);
-
 extern uint32 WaitEventExtensionNew(const char *wait_event_name);
-extern char **GetWaitEventExtensionNames(int *nwaitevents);
+extern uint32 WaitEventInjectionPointNew(const char *wait_event_name);
+
+extern void WaitEventCustomShmemInit(void);
+extern Size WaitEventCustomShmemSize(void);
+extern char **GetWaitEventCustomNames(uint32 classId, int *nwaitevents);
 
 /* ----------
  * pgstat_report_wait_start() -
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 5c44625..1b695a1 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -216,7 +216,7 @@ injection_wait(const char *name, const void *private_data)
 	 * this custom wait event name is not released, but we don't care much for
 	 * testing as this should be short-lived.
 	 */
-	injection_wait_event = WaitEventExtensionNew(name);
+	injection_wait_event = WaitEventInjectionPointNew(name);
 
 	/*
 	 * Find a free slot to wait for, and register this injection point's name.
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13..2176a54 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -158,9 +158,10 @@ select name, setting from pg_settings where name like 'enable%';
  enable_tidscan                 | on
 (22 rows)
 
--- There are always wait event descriptions for various types.
+-- There are always wait event descriptions for various types.  InjectionPoint
+-- may be present or absent, depending on history since last postmaster start.
 select type, count(*) > 0 as ok FROM pg_wait_events
-  group by type order by type COLLATE "C";
+  where type <> 'InjectionPoint' group by type order by type COLLATE "C";
    type    | ok 
 -----------+----
  Activity  | t
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index c4f59dd..b047fb5 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -70,9 +70,10 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
 
--- There are always wait event descriptions for various types.
+-- There are always wait event descriptions for various types.  InjectionPoint
+-- may be present or absent, depending on history since last postmaster start.
 select type, count(*) > 0 as ok FROM pg_wait_events
-  group by type order by type COLLATE "C";
+  where type <> 'InjectionPoint' group by type order by type COLLATE "C";
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 61ad417..75433b3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3099,9 +3099,9 @@ WaitEvent
 WaitEventActivity
 WaitEventBufferPin
 WaitEventClient
-WaitEventExtensionCounterData
-WaitEventExtensionEntryById
-WaitEventExtensionEntryByName
+WaitEventCustomCounterData
+WaitEventCustomEntryByInfo
+WaitEventCustomEntryByName
 WaitEventIO
 WaitEventIPC
 WaitEventSet

#40

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#39)

Re: race condition in pg_class

On Fri, Jun 21, 2024 at 02:28:42PM -0700, Noah Misch wrote:

On Thu, Jun 13, 2024 at 05:35:49PM -0700, Noah Misch wrote:

I think the attached covers all comments to date. I gave everything v3, but
most patches have just a no-conflict rebase vs. v2. The exceptions are
inplace031-inj-wait-event (implements the holding from that branch of the
thread) and inplace050-tests-inj (updated to cooperate with inplace031). Much
of inplace031-inj-wait-event is essentially s/Extension/Custom/ for the
infrastructure common to the two custom wait event types.

Starting 2024-06-27, I'd like to push
inplace080-catcache-detoast-inplace-stale and earlier patches, self-certifying
them if needed. Then I'll submit the last three to the commitfest. Does
anyone want me to delay that step?

Pushed. Buildfarm member prion is failing the new inplace-inval.spec, almost
surely because prion uses -DCATCACHE_FORCE_RELEASE and inplace-inval.spec is
testing an extant failure to inval a cache entry. Naturally, inexorable inval
masks the extant bug. Ideally, I'd just skip the test under any kind of cache
clobber option. I don't know a pleasant way to do that, so these are
known-feasible things I'm considering:

1. Neutralize the test in all branches, probably by having it just not report
the final answer. Undo in the later fix patch.

2. v14+ has pg_backend_memory_contexts. In the test, run some plpgsql that
uses heuristics on that to deduce whether caches are getting released.
Have a separate expected output for the cache-release scenario. Perhaps
also have the test treat installcheck like cache-release, since
installcheck could experience sinval reset with similar consequences.
Neutralize the test in v12 & v13.

3. Add a test module with a C function that reports whether any kind of cache
clobber is active. Call it in this test. Have a separate expected output
for the cache-release scenario.

Preferences or other ideas? I'm waffling between (1) and (2). I'll give it
more thought over the next day.

Thanks,
nm

#41

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Noah Misch (#40)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

Pushed. Buildfarm member prion is failing the new inplace-inval.spec, almost
surely because prion uses -DCATCACHE_FORCE_RELEASE and inplace-inval.spec is
testing an extant failure to inval a cache entry. Naturally, inexorable inval
masks the extant bug. Ideally, I'd just skip the test under any kind of cache
clobber option. I don't know a pleasant way to do that, so these are
known-feasible things I'm considering:

1. Neutralize the test in all branches, probably by having it just not report
the final answer. Undo in the later fix patch.

2. v14+ has pg_backend_memory_contexts. In the test, run some plpgsql that
uses heuristics on that to deduce whether caches are getting released.
Have a separate expected output for the cache-release scenario. Perhaps
also have the test treat installcheck like cache-release, since
installcheck could experience sinval reset with similar consequences.
Neutralize the test in v12 & v13.

3. Add a test module with a C function that reports whether any kind of cache
clobber is active. Call it in this test. Have a separate expected output
for the cache-release scenario.

Preferences or other ideas? I'm waffling between (1) and (2). I'll give it
more thought over the next day.

I'd just go for (1). We were doing fine without this test case.
I can't see expending effort towards hiding its result rather
than actually fixing anything.

regards, tom lane

#42

noah@leadboat.com

over 1 year ago

In reply to: Tom Lane (#41)

3 attachment(s)

Re: race condition in pg_class

On Fri, Jun 28, 2024 at 01:17:22AM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

Pushed. Buildfarm member prion is failing the new inplace-inval.spec, almost
surely because prion uses -DCATCACHE_FORCE_RELEASE and inplace-inval.spec is
testing an extant failure to inval a cache entry. Naturally, inexorable inval
masks the extant bug. Ideally, I'd just skip the test under any kind of cache
clobber option. I don't know a pleasant way to do that, so these are
known-feasible things I'm considering:

1. Neutralize the test in all branches, probably by having it just not report
the final answer. Undo in the later fix patch.

2. v14+ has pg_backend_memory_contexts. In the test, run some plpgsql that
uses heuristics on that to deduce whether caches are getting released.
Have a separate expected output for the cache-release scenario. Perhaps
also have the test treat installcheck like cache-release, since
installcheck could experience sinval reset with similar consequences.
Neutralize the test in v12 & v13.

3. Add a test module with a C function that reports whether any kind of cache
clobber is active. Call it in this test. Have a separate expected output
for the cache-release scenario.

Preferences or other ideas? I'm waffling between (1) and (2). I'll give it
more thought over the next day.

I'd just go for (1). We were doing fine without this test case.
I can't see expending effort towards hiding its result rather
than actually fixing anything.

Good point, any effort on (2) would be wasted once the fixes get certified. I
pushed (1). I'm attaching the rebased fix patches.

Attachments:

inplace090-LOCKTAG_TUPLE-eoxact-v5.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v5.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..107507e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,188 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take lock that conflicts with DROP.  If it does happen
+ * somehow, we'll wait for it like we would an update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v5.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 107507e..797bddf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6334,7 +6447,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..b70d2f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2320,6 +2320,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2328,6 +2330,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2421,6 +2424,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2525,6 +2536,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2850,6 +2869,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2886,17 +2906,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2968,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2979,7 +3015,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2999,7 +3035,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3017,7 +3054,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3028,7 +3065,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3109,7 +3146,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3183,7 +3220,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3201,6 +3238,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3229,7 +3275,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3257,13 +3303,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3310,6 +3356,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3761,6 +3811,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4073,6 +4124,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4084,6 +4137,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4092,6 +4146,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4103,6 +4162,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 930cc03..3f1e8ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3770,6 +3770,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3812,11 +3813,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3936,9 +3938,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b62c96f..eab0add 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#43

exclusion@gmail.com

over 1 year ago

In reply to: Noah Misch (#42)

Re: race condition in pg_class

Hello Noah,

29.06.2024 05:42, Noah Misch wrote:

Good point, any effort on (2) would be wasted once the fixes get certified. I
pushed (1). I'm attaching the rebased fix patches.

Please look at a new anomaly, introduced by inplace110-successors-v5.patch:
CREATE TABLE t (i int) PARTITION BY LIST(i);
CREATE TABLE p1 (i int);
ALTER TABLE t ATTACH PARTITION p1 FOR VALUES IN (1);
ALTER TABLE t DETACH PARTITION p1;
ANALYZE t;

triggers unexpected
ERROR: tuple to be updated was already modified by an operation triggered by the current command

Best regards,
Alexander

#44

noah@leadboat.com

over 1 year ago

In reply to: Alexander Lakhin (#43)

3 attachment(s)

Re: race condition in pg_class

On Wed, Jul 03, 2024 at 06:00:00AM +0300, Alexander Lakhin wrote:

29.06.2024 05:42, Noah Misch wrote:

Good point, any effort on (2) would be wasted once the fixes get certified. I
pushed (1). I'm attaching the rebased fix patches.

Please look at a new anomaly, introduced by inplace110-successors-v5.patch:
CREATE TABLE t (i int) PARTITION BY LIST(i);
CREATE TABLE p1 (i int);
ALTER TABLE t ATTACH PARTITION p1 FOR VALUES IN (1);
ALTER TABLE t DETACH PARTITION p1;
ANALYZE t;

triggers unexpected
ERROR:ï¿½ tuple to be updated was already modified by an operation triggered by the current command

Thanks. Today, it's okay to issue heap_inplace_update() after heap_update()
without an intervening CommandCounterIncrement(). The patch makes the CCI
required. The ANALYZE in your example reaches this with a heap_update to set
relhassubclass=f. I've fixed this by just adding a CCI (and adding to the
tests in vacuum.sql).

The alternative would be to allow inplace updates on TM_SelfModified tuples.
I can't think of a specific problem with allowing that, but I feel that would
make system state interactions harder to reason about. It might be optimal to
allow that in back branches only, to reduce the chance of releasing a bug like
the one you found.

Attachments:

inplace090-LOCKTAG_TUPLE-eoxact-v6.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v6.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..faec28a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,191 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24..c590a2a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -629,7 +629,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		else
 			relallvisible = 0;
 
-		/* Update pg_class for table relation */
+		/*
+		 * Update pg_class for table relation.  CCI first, in case acquirefunc
+		 * updated pg_class.
+		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel,
 							relpages,
 							totalrows,
@@ -664,6 +668,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * Partitioned tables don't have storage, so we don't set any fields
 		 * in their pg_class entries except for reltuples and relhasindex.
 		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel, -1, totalrows,
 							0, hasindex, InvalidTransactionId,
 							InvalidMultiXactId,
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 330fcd8..84a7208 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -83,6 +83,15 @@ INSERT INTO vactst SELECT generate_series(301, 400);
 DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
+-- Test ANALYZE setting relhassubclass=f
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+DROP TABLE past_part;
+ANALYZE past_parted;
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 0b63ef8..9952ba7a 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -67,6 +67,16 @@ DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
 
+-- Test ANALYZE setting relhassubclass=f
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+DROP TABLE past_part;
+ANALYZE past_parted;
+
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;

inplace120-locktag-v6.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 107507e..797bddf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6334,7 +6447,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..b70d2f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2320,6 +2320,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2328,6 +2330,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2421,6 +2424,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2525,6 +2536,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2850,6 +2869,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2886,17 +2906,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2968,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2979,7 +3015,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2999,7 +3035,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3017,7 +3054,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3028,7 +3065,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3109,7 +3146,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3183,7 +3220,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3201,6 +3238,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3229,7 +3275,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3257,13 +3303,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3310,6 +3356,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3761,6 +3811,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4073,6 +4124,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4084,6 +4137,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4092,6 +4146,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4103,6 +4162,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 930cc03..3f1e8ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3770,6 +3770,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3812,11 +3813,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3936,9 +3938,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b62c96f..eab0add 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#45

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#44)

4 attachment(s)

Re: race condition in pg_class

On Wed, Jul 03, 2024 at 04:09:54PM -0700, Noah Misch wrote:

On Wed, Jul 03, 2024 at 06:00:00AM +0300, Alexander Lakhin wrote:

29.06.2024 05:42, Noah Misch wrote:

Good point, any effort on (2) would be wasted once the fixes get certified. I
pushed (1). I'm attaching the rebased fix patches.

Please look at a new anomaly, introduced by inplace110-successors-v5.patch:
CREATE TABLE t (i int) PARTITION BY LIST(i);
CREATE TABLE p1 (i int);
ALTER TABLE t ATTACH PARTITION p1 FOR VALUES IN (1);
ALTER TABLE t DETACH PARTITION p1;
ANALYZE t;

triggers unexpected
ERROR:ï¿½ tuple to be updated was already modified by an operation triggered by the current command

Thanks. Today, it's okay to issue heap_inplace_update() after heap_update()
without an intervening CommandCounterIncrement().

Correction: it's not okay today. If code does that, heap_inplace_update()
mutates a tuple that is going to become invisible at CCI. The lack of CCI
yields a minor live bug in v14+. Its consequences seem to be limited to
failing to update reltuples for a partitioned table having zero partitions.

The patch makes the CCI
required. The ANALYZE in your example reaches this with a heap_update to set
relhassubclass=f. I've fixed this by just adding a CCI (and adding to the
tests in vacuum.sql).

That's still the right fix, but I've separated it into its own patch and
expanded the test. All the non-comment changes between v5 and v6 are now part
of the separate patch.

The alternative would be to allow inplace updates on TM_SelfModified tuples.
I can't think of a specific problem with allowing that, but I feel that would
make system state interactions harder to reason about. It might be optimal to
allow that in back branches only, to reduce the chance of releasing a bug like
the one you found.

Allowing a mutation of a TM_SelfModified tuple is bad, since that tuple is
going to become dead soon. Mutating its successor could be okay. Since we'd
expect such code to be unreachable, I'm not keen carry such code. For that
scenario, I'd rather keep the error you encountered. Other opinions?

Attachments:

inplace085-CCI-analyze-v7.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Don't lose partitioned table reltuples=0 after relhassubclass=f.
    
    ANALYZE sets relhassubclass=f when a partitioned table no longer has
    partitions.  An ANALYZE doing that proceeded to apply the inplace update
    of pg_class.reltuples to the old pg_class tuple instead of the new
    tuple, losing that reltuples=0 change if the ANALYZE committed.
    Non-partitioning inheritance trees were unaffected.  Back-patch to v14,
    where commit 375aed36ad83f0e021e9bdd3a0034c0c992c66dc introduced
    maintenance of partitioned table pg_class.reltuples.
    
    Reviewed by FIXME.  Reported by Alexander Lakhin.
    
    Discussion: https://postgr.es/m/a295b499-dcab-6a99-c06e-01cf60593344@gmail.com

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24..c590a2a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -629,7 +629,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		else
 			relallvisible = 0;
 
-		/* Update pg_class for table relation */
+		/*
+		 * Update pg_class for table relation.  CCI first, in case acquirefunc
+		 * updated pg_class.
+		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel,
 							relpages,
 							totalrows,
@@ -664,6 +668,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * Partitioned tables don't have storage, so we don't set any fields
 		 * in their pg_class entries except for reltuples and relhasindex.
 		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel, -1, totalrows,
 							0, hasindex, InvalidTransactionId,
 							InvalidMultiXactId,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 330fcd8..2eba712 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -83,6 +83,53 @@ INSERT INTO vactst SELECT generate_series(301, 400);
 DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
+-- Test ANALYZE setting relhassubclass=f for non-partitioning inheritance
+BEGIN;
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+INSERT INTO past_inh_child DEFAULT VALUES;
+INSERT INTO past_inh_child DEFAULT VALUES;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | t
+(1 row)
+
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | f
+(1 row)
+
+COMMIT;
+-- Test ANALYZE setting relhassubclass=f for partitioning
+BEGIN;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+INSERT INTO past_parted VALUES (1),(1);
+ANALYZE past_parted;
+DROP TABLE past_part;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         2 | t
+(1 row)
+
+ANALYZE past_parted;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | f
+(1 row)
+
+COMMIT;
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 0b63ef8..548cd7a 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -67,6 +67,35 @@ DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
 
+-- Test ANALYZE setting relhassubclass=f for non-partitioning inheritance
+BEGIN;
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+INSERT INTO past_inh_child DEFAULT VALUES;
+INSERT INTO past_inh_child DEFAULT VALUES;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+COMMIT;
+
+-- Test ANALYZE setting relhassubclass=f for partitioning
+BEGIN;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+INSERT INTO past_parted VALUES (1),(1);
+ANALYZE past_parted;
+DROP TABLE past_part;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ANALYZE past_parted;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+COMMIT;
+
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;

inplace090-LOCKTAG_TUPLE-eoxact-v7.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v7.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME and Alexander Lakhin.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..faec28a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,191 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v7.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faec28a..051aa10 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6335,7 +6448,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..b70d2f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2320,6 +2320,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2328,6 +2330,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2421,6 +2424,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2525,6 +2536,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2850,6 +2869,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2886,17 +2906,33 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
 	{
 		Assert(resultRelInfo->ri_TrigDesc);
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
 	}
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even tuples that don't match mas_whenqual, which
+			 * isn't ideal.  MERGE on system catalogs is a minor use case, so
+			 * don't bother doing better.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2968,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2979,7 +3015,7 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
@@ -2999,7 +3035,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3017,7 +3054,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3028,7 +3065,7 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3109,7 +3146,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3183,7 +3220,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3201,6 +3238,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3229,7 +3275,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3257,13 +3303,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3310,6 +3356,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3761,6 +3811,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4073,6 +4124,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4084,6 +4137,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4092,6 +4146,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4103,6 +4162,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 930cc03..3f1e8ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3770,6 +3770,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3812,11 +3813,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3936,9 +3938,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b62c96f..eab0add 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#46

exclusion@gmail.com

over 1 year ago

In reply to: Noah Misch (#40)

Re: race condition in pg_class

Hello Noah,

28.06.2024 08:13, Noah Misch wrote:

Pushed. ...

Please look also at another anomaly, I've discovered.

An Assert added with d5f788b41 may be falsified with:
CREATE TABLE t(a int PRIMARY KEY);
INSERT INTO t VALUES (1);
CREATE VIEW v AS SELECT * FROM t;

MERGE INTO v USING (VALUES (1)) AS va(a) ON v.a = va.a
WHEN MATCHED THEN DO NOTHING
WHEN NOT MATCHED THEN DO NOTHING;

TRAP: failed Assert("resultRelInfo->ri_TrigDesc"), File: "nodeModifyTable.c", Line: 2891, PID: 1590670

Best regards,
Alexander

#47

noah@leadboat.com

over 1 year ago

In reply to: Alexander Lakhin (#46)

5 attachment(s)

Re: race condition in pg_class

On Thu, Jul 04, 2024 at 08:00:00AM +0300, Alexander Lakhin wrote:

28.06.2024 08:13, Noah Misch wrote:

Pushed. ...

Please look also at another anomaly, I've discovered.

An Assert added with d5f788b41 may be falsified with:
CREATE TABLE t(a int PRIMARY KEY);
INSERT INTO t VALUES (1);
CREATE VIEW v AS SELECT * FROM t;

MERGE INTO v USING (VALUES (1)) AS va(a) ON v.a = va.a
ï¿½ WHEN MATCHED THEN DO NOTHING
ï¿½ WHEN NOT MATCHED THEN DO NOTHING;

TRAP: failed Assert("resultRelInfo->ri_TrigDesc"), File: "nodeModifyTable.c", Line: 2891, PID: 1590670

Thanks. When all the MERGE actions are DO NOTHING, view_has_instead_trigger()
returns true, so we use the wholerow code regardless of the view's triggers or
auto update capability. The behavior is fine, so I'm fixing the new assertion
and comments with new patch inplace087-merge-DO-NOTHING-v8.patch. The closest
relevant tests processed zero rows, so they narrowly avoided witnessing this
assertion.

Attachments:

inplace085-CCI-analyze-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Don't lose partitioned table reltuples=0 after relhassubclass=f.
    
    ANALYZE sets relhassubclass=f when a partitioned table no longer has
    partitions.  An ANALYZE doing that proceeded to apply the inplace update
    of pg_class.reltuples to the old pg_class tuple instead of the new
    tuple, losing that reltuples=0 change if the ANALYZE committed.
    Non-partitioning inheritance trees were unaffected.  Back-patch to v14,
    where commit 375aed36ad83f0e021e9bdd3a0034c0c992c66dc introduced
    maintenance of partitioned table pg_class.reltuples.
    
    Reviewed by FIXME.  Reported by Alexander Lakhin.
    
    Discussion: https://postgr.es/m/a295b499-dcab-6a99-c06e-01cf60593344@gmail.com

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24..c590a2a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -629,7 +629,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		else
 			relallvisible = 0;
 
-		/* Update pg_class for table relation */
+		/*
+		 * Update pg_class for table relation.  CCI first, in case acquirefunc
+		 * updated pg_class.
+		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel,
 							relpages,
 							totalrows,
@@ -664,6 +668,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * Partitioned tables don't have storage, so we don't set any fields
 		 * in their pg_class entries except for reltuples and relhasindex.
 		 */
+		CommandCounterIncrement();
 		vac_update_relstats(onerel, -1, totalrows,
 							0, hasindex, InvalidTransactionId,
 							InvalidMultiXactId,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 330fcd8..2eba712 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -83,6 +83,53 @@ INSERT INTO vactst SELECT generate_series(301, 400);
 DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
+-- Test ANALYZE setting relhassubclass=f for non-partitioning inheritance
+BEGIN;
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+INSERT INTO past_inh_child DEFAULT VALUES;
+INSERT INTO past_inh_child DEFAULT VALUES;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | t
+(1 row)
+
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | f
+(1 row)
+
+COMMIT;
+-- Test ANALYZE setting relhassubclass=f for partitioning
+BEGIN;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+INSERT INTO past_parted VALUES (1),(1);
+ANALYZE past_parted;
+DROP TABLE past_part;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         2 | t
+(1 row)
+
+ANALYZE past_parted;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ reltuples | relhassubclass 
+-----------+----------------
+         0 | f
+(1 row)
+
+COMMIT;
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 0b63ef8..548cd7a 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -67,6 +67,35 @@ DELETE FROM vactst WHERE i % 5 <> 0; -- delete a few rows inside
 ANALYZE vactst;
 COMMIT;
 
+-- Test ANALYZE setting relhassubclass=f for non-partitioning inheritance
+BEGIN;
+CREATE TABLE past_inh_parent ();
+CREATE TABLE past_inh_child () INHERITS (past_inh_parent);
+INSERT INTO past_inh_child DEFAULT VALUES;
+INSERT INTO past_inh_child DEFAULT VALUES;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+DROP TABLE past_inh_child;
+ANALYZE past_inh_parent;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_inh_parent'::regclass;
+COMMIT;
+
+-- Test ANALYZE setting relhassubclass=f for partitioning
+BEGIN;
+CREATE TABLE past_parted (i int) PARTITION BY LIST(i);
+CREATE TABLE past_part PARTITION OF past_parted FOR VALUES IN (1);
+INSERT INTO past_parted VALUES (1),(1);
+ANALYZE past_parted;
+DROP TABLE past_part;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+ANALYZE past_parted;
+SELECT reltuples, relhassubclass
+  FROM pg_class WHERE oid = 'past_parted'::regclass;
+COMMIT;
+
 VACUUM FULL pg_am;
 VACUUM FULL pg_class;
 VACUUM FULL pg_database;

inplace087-merge-DO-NOTHING-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix new assertion for MERGE view_name ... DO NOTHING.
    
    Such queries don't expand automatically updatable views, and ModifyTable
    uses the wholerow attribute unconditionally.  The user-visible behavior
    is fine, so change to more-specific assertions.  Commit
    d5f788b41dc2cbdde6e7694c70dda54d829a5ed5 added the wrong assertion.
    Back-patch to v17, where commit 5f2e179bd31e5f5803005101eb12a8d7bf8db8f3
    introduced MERGE view_name.
    
    Reviewed by FIXME.  Reported by Alexander Lakhin.
    
    Discussion: https://postgr.es/m/e4b40a88-c134-6926-3196-bc4501cb87a2@gmail.com

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a2442b7..4913e49 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -24,12 +24,13 @@
  *		values plus row-locating info for UPDATE and MERGE cases, or just the
  *		row-locating info for DELETE cases.
  *
- *		The relation to modify can be an ordinary table, a view having an
- *		INSTEAD OF trigger, or a foreign table.  Earlier processing already
- *		pointed ModifyTable to the underlying relations of any automatically
- *		updatable view not using an INSTEAD OF trigger, so code here can
- *		assume it won't have one as a modification target.  This node does
- *		process ri_WithCheckOptions, which may have expressions from those
+ *		The relation to modify can be an ordinary table, a foreign table, or a
+ *		view.  If it's a view, either it has sufficient INSTEAD OF triggers or
+ *		this node executes only MERGE ... DO NOTHING.  If the original MERGE
+ *		targeted a view not in one of those two categories, earlier processing
+ *		already pointed the ModifyTable result relation to an underlying
+ *		relation of that other view.  This node does process
+ *		ri_WithCheckOptions, which may have expressions from those other,
  *		automatically updatable views.
  *
  *		MERGE runs a join between the source relation and the target table.
@@ -2726,10 +2727,10 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 
 	/*-----
 	 * If we are dealing with a WHEN MATCHED case, tupleid or oldtuple is
-	 * valid, depending on whether the result relation is a table or a view
-	 * having an INSTEAD OF trigger.  We execute the first action for which
-	 * the additional WHEN MATCHED AND quals pass.  If an action without quals
-	 * is found, that action is executed.
+	 * valid, depending on whether the result relation is a table or a view.
+	 * We execute the first action for which the additional WHEN MATCHED AND
+	 * quals pass.  If an action without quals is found, that action is
+	 * executed.
 	 *
 	 * Similarly, in the WHEN NOT MATCHED BY SOURCE case, tupleid or oldtuple
 	 * is valid, and we look at the given WHEN NOT MATCHED BY SOURCE actions
@@ -2820,8 +2821,8 @@ ExecMerge(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
  * Check and execute the first qualifying MATCHED or NOT MATCHED BY SOURCE
  * action, depending on whether the join quals are satisfied.  If the target
  * relation is a table, the current target tuple is identified by tupleid.
- * Otherwise, if the target relation is a view having an INSTEAD OF trigger,
- * oldtuple is the current target tuple from the view.
+ * Otherwise, if the target relation is a view, oldtuple is the current target
+ * tuple from the view.
  *
  * We start from the first WHEN MATCHED or WHEN NOT MATCHED BY SOURCE action
  * and check if the WHEN quals pass, if any. If the WHEN quals for the first
@@ -2887,11 +2888,8 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
 	if (oldtuple != NULL)
-	{
-		Assert(resultRelInfo->ri_TrigDesc);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
-	}
 	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
 											tupleid,
 											SnapshotAny,
@@ -2983,6 +2981,9 @@ lmerge_matched:
 				}
 				else
 				{
+					/* called table_tuple_fetch_row_version() above */
+					Assert(oldtuple == NULL);
+
 					result = ExecUpdateAct(context, resultRelInfo, tupleid,
 										   NULL, newslot, canSetTag,
 										   &updateCxt);
@@ -3031,8 +3032,13 @@ lmerge_matched:
 						return NULL;	/* "do nothing" */
 				}
 				else
+				{
+					/* called table_tuple_fetch_row_version() above */
+					Assert(oldtuple == NULL);
+
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
 										   false);
+				}
 
 				if (result == TM_Ok)
 				{
@@ -4004,8 +4010,8 @@ ExecModifyTable(PlanState *pstate)
 			 * know enough here to set t_tableOid.  Quite separately from
 			 * this, the FDW may fetch its own junk attrs to identify the row.
 			 *
-			 * Other relevant relkinds, currently limited to views having
-			 * INSTEAD OF triggers, always have a wholerow attribute.
+			 * Other relevant relkinds, currently limited to views, always
+			 * have a wholerow attribute.
 			 */
 			else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
 			{
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568..5a2da97 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -199,6 +199,9 @@ MERGE INTO ro_view13 AS t USING (VALUES (3, 'Row 3')) AS v(a,b) ON t.a = v.a
 ERROR:  cannot insert into view "ro_view13"
 DETAIL:  Views that do not select from a single table or view are not automatically updatable.
 HINT:  To enable inserting into the view using MERGE, provide an INSTEAD OF INSERT trigger.
+MERGE INTO ro_view13 AS t USING (VALUES (2, 'Row 2')) AS v(a,b) ON t.a = v.a
+  WHEN MATCHED THEN DO NOTHING
+  WHEN NOT MATCHED THEN DO NOTHING; -- should be OK to do nothing
 MERGE INTO ro_view13 AS t USING (VALUES (3, 'Row 3')) AS v(a,b) ON t.a = v.a
   WHEN MATCHED THEN DO NOTHING
   WHEN NOT MATCHED THEN DO NOTHING; -- should be OK to do nothing
@@ -375,6 +378,8 @@ DELETE FROM ro_view18;
 ERROR:  cannot delete from view "ro_view18"
 DETAIL:  Views that do not select from a single table or view are not automatically updatable.
 HINT:  To enable deleting from the view, provide an INSTEAD OF DELETE trigger or an unconditional ON DELETE DO INSTEAD rule.
+MERGE INTO ro_view18 AS t USING (VALUES (1, 'Row 1')) AS v(a,b) ON t.a = v.a
+  WHEN MATCHED THEN DO NOTHING; -- should be OK to do nothing
 UPDATE ro_view19 SET last_value=1000;
 ERROR:  cannot update view "ro_view19"
 DETAIL:  Views that do not select from a single table or view are not automatically updatable.
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index e0ab923..abfa557 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -68,6 +68,9 @@ MERGE INTO ro_view13 AS t USING (VALUES (2, 'Row 2')) AS v(a,b) ON t.a = v.a
   WHEN MATCHED THEN UPDATE SET b = v.b;
 MERGE INTO ro_view13 AS t USING (VALUES (3, 'Row 3')) AS v(a,b) ON t.a = v.a
   WHEN NOT MATCHED THEN INSERT VALUES (v.a, v.b);
+MERGE INTO ro_view13 AS t USING (VALUES (2, 'Row 2')) AS v(a,b) ON t.a = v.a
+  WHEN MATCHED THEN DO NOTHING
+  WHEN NOT MATCHED THEN DO NOTHING; -- should be OK to do nothing
 MERGE INTO ro_view13 AS t USING (VALUES (3, 'Row 3')) AS v(a,b) ON t.a = v.a
   WHEN MATCHED THEN DO NOTHING
   WHEN NOT MATCHED THEN DO NOTHING; -- should be OK to do nothing
@@ -121,6 +124,8 @@ DELETE FROM rw_view16 WHERE a=-3; -- should be OK
 -- Read-only views
 INSERT INTO ro_view17 VALUES (3, 'ROW 3');
 DELETE FROM ro_view18;
+MERGE INTO ro_view18 AS t USING (VALUES (1, 'Row 1')) AS v(a,b) ON t.a = v.a
+  WHEN MATCHED THEN DO NOTHING; -- should be OK to do nothing
 UPDATE ro_view19 SET last_value=1000;
 UPDATE ro_view20 SET b=upper(b);

inplace090-LOCKTAG_TUPLE-eoxact-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME and Alexander Lakhin.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..faec28a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,191 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faec28a..051aa10 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6335,7 +6448,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4913e49..02be418 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2321,6 +2321,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2329,6 +2331,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2422,6 +2425,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2526,6 +2537,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2851,6 +2870,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2887,14 +2907,32 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
+	{
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	}
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even for CMD_DELETE, for CMD_NOTHING, and for tuples
+			 * that don't match mas_whenqual.  MERGE on system catalogs is a
+			 * minor use case, so don't bother optimizing those.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2966,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2977,11 +3015,11 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecUpdateAct(context, resultRelInfo, tupleid,
@@ -3000,7 +3038,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3018,7 +3057,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3029,11 +3068,11 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3115,7 +3154,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3189,7 +3228,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3207,6 +3246,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3235,7 +3283,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3263,13 +3311,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3316,6 +3364,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3767,6 +3819,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4079,6 +4132,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4090,6 +4145,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4098,6 +4154,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4109,6 +4170,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 930cc03..3f1e8ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3770,6 +3770,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3812,11 +3813,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3936,9 +3938,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b62c96f..eab0add 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#48

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#47)

3 attachment(s)

Re: race condition in pg_class

On Thu, Jul 04, 2024 at 03:08:16PM -0700, Noah Misch wrote:

On Thu, Jul 04, 2024 at 08:00:00AM +0300, Alexander Lakhin wrote:

28.06.2024 08:13, Noah Misch wrote:

Pushed. ...

Please look also at another anomaly, I've discovered.

An Assert added with d5f788b41 may be falsified with:
CREATE TABLE t(a int PRIMARY KEY);
INSERT INTO t VALUES (1);
CREATE VIEW v AS SELECT * FROM t;

MERGE INTO v USING (VALUES (1)) AS va(a) ON v.a = va.a
ï¿½ WHEN MATCHED THEN DO NOTHING
ï¿½ WHEN NOT MATCHED THEN DO NOTHING;

TRAP: failed Assert("resultRelInfo->ri_TrigDesc"), File: "nodeModifyTable.c", Line: 2891, PID: 1590670

Thanks. When all the MERGE actions are DO NOTHING, view_has_instead_trigger()
returns true

I've pushed the two patches for your reports. To placate cfbot, I'm attaching
the remaining patches.

Attachments:

inplace090-LOCKTAG_TUPLE-eoxact-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..461d925 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,11 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the heap_inplace_update_scan()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by FIXME and Alexander Lakhin.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..dbfa2b7 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,56 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Locking to write inplace-updated tables
+---------------------------------------
+
+[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire
+  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
+  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
+  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
+  on the table.  Locking the index rel is optional.  (This allows VACUUM to
+  overwrite per-index pg_class while holding a lock on the table alone.)  We
+  could allow weaker locks, in which case the next paragraph would simply call
+  for stronger locks for its class of commands.  heap_inplace_update_scan()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock that conflicts with at least one of those from the preceding paragraph.
+  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
+  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
+  to acquire just the LOCKTAG_RELATION.
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+heap_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..faec28a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 										 ItemPointer ctid, TransactionId xid,
 										 LockTupleMode mode);
+static bool inplace_xmax_lock(SysScanDesc scan);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
 								   uint16 *new_infomask2);
 static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -4072,6 +4076,45 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6041,34 +6084,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_update_scan - update a row "in place" (ie, overwrite it)
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this skips some predicate locks.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel().
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_scan(Relation relation,
+						 Oid indexId,
+						 bool indexOK,
+						 Snapshot snapshot,
+						 int nkeys, const ScanKeyData *key,
+						 HeapTuple *oldtupcopy, void **state)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
-	uint32		oldlen;
-	uint32		newlen;
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6081,21 +6135,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
 
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+#ifdef USE_ASSERT_CHECKING
+		if (RelationGetRelid(relation) == RelationRelationId)
+			check_inplace_rel_lock(oldtup);
+#endif
+	} while (!inplace_xmax_lock(scan));
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * heap_inplace_update_finish - second phase of heap_inplace_update_scan()
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+heap_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	HeapTupleHeader htup = oldtup->t_data;
+	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
+	uint32		oldlen;
+	uint32		newlen;
+
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6210,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6243,191 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_update_cancel - abandon a heap_inplace_update_scan()
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+heap_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer = bslot->buffer;
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	systable_endscan(scan);
+}
+
+/*
+ * inplace_xmax_lock - protect inplace update from concurrent heap_update()
+ *
+ * This operates on the last tuple that systable_getnext() returned.  Evaluate
+ * whether the tuple's state is compatible with a no-key update.  Current
+ * transaction rowmarks are fine, as is KEY SHARE from any transaction.  If
+ * compatible, return true with the buffer exclusive-locked.  Otherwise,
+ * return false after blocking transactions, if any, have ended.
+ *
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
+ *
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would tolerate either (a) mutating all
+ * tuples in one critical section or (b) accepting a chance of partial
+ * completion.  Partial completion of a relfrozenxid update would have the
+ * weird consequence that the table's next VACUUM could see the table's
+ * relfrozenxid move forward between vacuum_get_cutoffs() and finishing.
+ */
+static bool
+inplace_xmax_lock(SysScanDesc scan)
+{
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTupleData oldtup = *bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+	TM_Result	result;
+	bool		ret;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+		Relation	relation;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+		relation = scan->heap_rel;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				systable_endscan(scan);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			systable_endscan(scan);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to avoid spinning
+	 * that ends with a "too many tries" error.  While we don't need this if
+	 * xwait aborted, don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a819b41..b4b68b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2784,7 +2784,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2818,33 +2820,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key,
+							 &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2907,11 +2888,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		heap_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		heap_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..c882f3c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		heap_inplace_update_scan(class_rel, ClassOidIndexId, true,
+								 NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		heap_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index be629ea..da4d2b7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData key[1];
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
 	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
@@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(db_id));
+	heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	heap_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..22d0ce7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true,
+									 NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				heap_inplace_update_finish(state, tuple);
 			}
+			else
+				heap_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab..d299a25 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	heap_inplace_update_scan(rd, ClassOidIndexId, true,
+							 NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		heap_inplace_update_finish(inplace_state, ctup);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	heap_inplace_update_scan(relation, DatabaseOidIndexId, true,
+							 NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		heap_inplace_update_finish(inplace_state, tuple);
+	else
+		heap_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..2e13fb9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern void heap_inplace_update_scan(Relation relation,
+									 Oid indexId,
+									 bool indexOK,
+									 Snapshot snapshot,
+									 int nkeys, const ScanKeyData *key,
+									 HeapTuple *oldtupcopy, void **state);
+extern void heap_inplace_update_finish(void *state, HeapTuple tuple);
+extern void heap_inplace_update_cancel(void *state);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..c2a9841 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -122,17 +124,18 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..eed0b52 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -73,7 +73,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v8.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index dbfa2b7..fb06ff2 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -157,8 +157,6 @@ is set.
 Locking to write inplace-updated tables
 ---------------------------------------
 
-[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.]
-
 If IsInplaceUpdateRelation() returns true for a table, the table is a system
 catalog that receives heap_inplace_update_scan() calls.  Preparing a
 heap_update() of these tables follows additional locking rules, to ensure we
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faec28a..051aa10 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -77,6 +79,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
 #ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
 static void check_inplace_rel_lock(HeapTuple oldtup);
 #endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
@@ -126,6 +131,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3212,6 +3219,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4078,6 +4089,89 @@ l2:
 
 #ifdef USE_ASSERT_CHECKING
 /*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
  * Confirm adequate relation lock held, per rules from README.tuplock section
  * "Locking to write inplace-updated tables".
  */
@@ -6123,6 +6217,7 @@ heap_inplace_update_scan(Relation relation,
 	int			retries = 0;
 	SysScanDesc scan;
 	HeapTuple	oldtup;
+	ItemPointerData locked;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -6144,6 +6239,7 @@ heap_inplace_update_scan(Relation relation,
 	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
 
 	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	ItemPointerSetInvalid(&locked);
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
@@ -6163,6 +6259,8 @@ heap_inplace_update_scan(Relation relation,
 		oldtup = systable_getnext(scan);
 		if (!HeapTupleIsValid(oldtup))
 		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
 			systable_endscan(scan);
 			*oldtupcopy = NULL;
 			return;
@@ -6172,6 +6270,15 @@ heap_inplace_update_scan(Relation relation,
 		if (RelationGetRelid(relation) == RelationRelationId)
 			check_inplace_rel_lock(oldtup);
 #endif
+
+		if (!(ItemPointerIsValid(&locked) &&
+			  ItemPointerEquals(&locked, &oldtup->t_self)))
+		{
+			if (ItemPointerIsValid(&locked))
+				UnlockTuple(relation, &locked, InplaceUpdateTupleLock);
+			LockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+		}
+		locked = oldtup->t_self;
 	} while (!inplace_xmax_lock(scan));
 
 	*oldtupcopy = heap_copytuple(oldtup);
@@ -6183,6 +6290,8 @@ heap_inplace_update_scan(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_finish(void *state, HeapTuple tuple)
@@ -6249,6 +6358,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
 	END_CRIT_SECTION();
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 
 	/*
@@ -6274,9 +6384,12 @@ heap_inplace_update_cancel(void *state)
 	SysScanDesc scan = (SysScanDesc) state;
 	TupleTableSlot *slot = scan->slot;
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
 	Buffer		buffer = bslot->buffer;
+	Relation	relation = scan->heap_rel;
 
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 	systable_endscan(scan);
 }
 
@@ -6335,7 +6448,7 @@ inplace_xmax_lock(SysScanDesc scan)
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - caller handles tuple lock, since inplace needs it unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da4d2b7..fd48022 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1864,6 +1864,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1935,11 +1936,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2188,6 +2191,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2196,6 +2200,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2426,6 +2431,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2475,6 +2481,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2524,6 +2531,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2552,6 +2560,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2560,14 +2569,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2679,6 +2689,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2700,6 +2712,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 22d0ce7..36d82bd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2caab88..8d04ca0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4409,14 +4409,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8fcb188..7fa80a5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3609,6 +3609,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3617,9 +3618,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3627,7 +3629,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4121,6 +4124,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4143,7 +4147,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4170,7 +4175,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14917,7 +14923,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15021,6 +15027,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17170,7 +17177,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17189,6 +17197,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17201,7 +17211,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17213,6 +17225,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d..321ad47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1209,6 +1209,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd..f18efdb 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -559,8 +559,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4913e49..02be418 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2321,6 +2321,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2329,6 +2331,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2422,6 +2425,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2526,6 +2537,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2851,6 +2870,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2887,14 +2907,32 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
+	{
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	}
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even for CMD_DELETE, for CMD_NOTHING, and for tuples
+			 * that don't match mas_whenqual.  MERGE on system catalogs is a
+			 * minor use case, so don't bother optimizing those.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2966,7 +3004,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2977,11 +3015,11 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecUpdateAct(context, resultRelInfo, tupleid,
@@ -3000,7 +3038,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3018,7 +3057,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3029,11 +3068,11 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3115,7 +3154,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3189,7 +3228,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3207,6 +3246,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3235,7 +3283,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3263,13 +3311,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3316,6 +3364,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3767,6 +3819,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4079,6 +4132,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4090,6 +4145,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4098,6 +4154,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4109,6 +4170,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 930cc03..3f1e8ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3770,6 +3770,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3812,11 +3813,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3936,9 +3938,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b62c96f..eab0add 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index c2a9841..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -154,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -194,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -223,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index eed0b52..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -73,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -126,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -138,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

#49

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08

exclusion@gmail.com

over 1 year ago

In reply to: Noah Misch (#40)

Re: race condition in pg_class

Hello Noah,

28.06.2024 08:13, Noah Misch wrote:

Pushed.

A recent buildfarm test failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08 showed that the
intra-grant-inplace-db.spec test added with 0844b3968 may fail
on a slow machine (per my understanding):

test intra-grant-inplace-db ... FAILED 4302 ms

@@ -21,8 +21,7 @@
WHERE datname = current_catalog
AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);

-?column?
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)

whilst the previous (successful) run shows much shorter duration:
test intra-grant-inplace-db ... ok 540 ms

I reproduced this failure on a VM slowed down so that the test duration
reached 4+ seconds, with 100 test: intra-grant-inplace-db in
isolation_schedule:
test intra-grant-inplace-db       ... ok         4324 ms
test intra-grant-inplace-db       ... FAILED     4633 ms
test intra-grant-inplace-db       ... ok         4649 ms

But as the test going to be modified by the inplace110-successors-v8.patch
and the modified test (with all three latest patches applied) passes
reliably in the same conditions, maybe this failure doesn't deserve a
deeper exploration.

What do you think?

Best regards,
Alexander

#50

noah@leadboat.com

over 1 year ago

In reply to: Alexander Lakhin (#49)

Re: race condition in pg_class

On Sat, Jul 20, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:

28.06.2024 08:13, Noah Misch wrote:

Pushed.

A recent buildfarm test failure [1] showed that the
intra-grant-inplace-db.spec test added with 0844b3968 may fail
on a slow machine

But as the test going to be modified by the inplace110-successors-v8.patch
and the modified test (with all three latest patches applied) passes
reliably in the same conditions, maybe this failure doesn't deserve a
deeper exploration.

Agreed. Let's just wait for code review of the actual bug fix, not develop a
separate change to stabilize the test. One flake in three weeks is low enough
to make that okay.

Show quoted text

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08

#51

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Noah Misch (#50)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Sat, Jul 20, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:

A recent buildfarm test failure [1] showed that the
intra-grant-inplace-db.spec test added with 0844b3968 may fail
on a slow machine

But as the test going to be modified by the inplace110-successors-v8.patch
and the modified test (with all three latest patches applied) passes
reliably in the same conditions, maybe this failure doesn't deserve a
deeper exploration.

Agreed. Let's just wait for code review of the actual bug fix, not develop a
separate change to stabilize the test. One flake in three weeks is low enough
to make that okay.

It's now up to three similar failures in the past ten days: in
addition to

I see

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=urutu&dt=2024-07-22%2018%3A00%3A46

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan&dt=2024-07-28%2012%3A20%3A37

Is it time to worry yet? If this were HEAD only, I'd not be too
concerned; but two of these three are on allegedly-stable branches.
And we have releases coming up fast.

(BTW, I don't think taipan qualifies as a slow machine.)

regards, tom lane

#52

noah@leadboat.com

over 1 year ago

In reply to: Tom Lane (#51)

Re: race condition in pg_class

On Sun, Jul 28, 2024 at 11:50:33AM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Sat, Jul 20, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:

A recent buildfarm test failure [1] showed that the
intra-grant-inplace-db.spec test added with 0844b3968 may fail

But as the test going to be modified by the inplace110-successors-v8.patch
and the modified test (with all three latest patches applied) passes
reliably in the same conditions, maybe this failure doesn't deserve a
deeper exploration.

Agreed. Let's just wait for code review of the actual bug fix, not develop a
separate change to stabilize the test. One flake in three weeks is low enough
to make that okay.

It's now up to three similar failures in the past ten days

Is it time to worry yet? If this were HEAD only, I'd not be too
concerned; but two of these three are on allegedly-stable branches.
And we have releases coming up fast.

I don't know; neither decision feels terrible to me. A bug fix that would
address both the data corruption causes and those buildfarm failures has been
awaiting review on this thread for 77 days. The data corruption causes are
more problematic than 0.03% of buildfarm runs getting noise failures. Two
wrongs don't make a right, but a commit masking that level of buildfarm noise
also feels like sending the wrong message.

#53

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Noah Misch (#52)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Sun, Jul 28, 2024 at 11:50:33AM -0400, Tom Lane wrote:

Is it time to worry yet? If this were HEAD only, I'd not be too
concerned; but two of these three are on allegedly-stable branches.
And we have releases coming up fast.

I don't know; neither decision feels terrible to me.

Yeah, same here. Obviously, it'd be better to spend effort on getting
the bug fix committed than to spend effort on some cosmetic
workaround.

The fact that the failure is in the isolation tests not the core
regression tests reduces my level of concern somewhat about shipping
it this way. I think that packagers typically run the core tests
not check-world during package verification, so they won't hit this.

regards, tom lane

#54

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Noah Misch (#48)

Re: race condition in pg_class

On 14/07/2024 20:48, Noah Misch wrote:

I've pushed the two patches for your reports. To placate cfbot, I'm attaching
the remaining patches.

inplace090-LOCKTAG_TUPLE-eoxact-v8.patch: Makes sense. A comment would
be in order, it looks pretty random as it is. Something like:

/*
* Tuple locks are currently only held for short durations within a
* transaction. Check that we didn't forget to release one.
*/

inplace110-successors-v8.patch: Makes sense.

The README changes would be better as part of the third patch, as this
patch doesn't actually do any of the new locking described in the
README, and it fixes the "inplace update updates wrong tuple" bug even
without those tuple locks.

+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);

I wonder if the functions should be called "systable_*" and placed in
genam.c rather than in heapam.c. The interface looks more like the
existing systable functions. It feels like a modularity violation for a
function in heapam.c to take an argument like "indexId", and call back
into systable_* functions.

/*----------
* XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
*
* ["D" is a VACUUM (ONLY_DATABASE_STATS)]
* ["R" is a VACUUM tbl]
* D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
* c * D: systable_getnext() returns pg_class tuple of tbl
* R: memcpy() into pg_class tuple of tbl
* D: raise pg_database.datfrozenxid, XLogInsert(), finish
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
*/

Hmm, that's a tight race, but feels bad to leave it unfixed. One
approach would be to modify the tuple on the buffer only after
WAL-logging it. That way, D cannot read the updated value before it has
been WAL logged. Just need to make sure that the change still gets
included in the WAL record. Maybe something like:

if (RelationNeedsWAL(relation))
{
/*
* Make a temporary copy of the page that includes the change, in
* case the a full-page image is logged
*/
PGAlignedBlock tmppage;

memcpy(tmppage.data, page, BLCKSZ);

/* copy the tuple to the temporary copy */
memcpy(...);

XLogRegisterBlock(0, ..., tmppage, REGBUF_STANDARD);
XLogInsert();
}

/* copy the tuple to the buffer */
memcpy(...);

pg_class heap_inplace_update_scan() callers: before the call, acquire
LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
(VACUUM), or a mode with strictly more conflicts. If the update targets a
row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
on the table. Locking the index rel is optional. (This allows VACUUM to
overwrite per-index pg_class while holding a lock on the table alone.) We
could allow weaker locks, in which case the next paragraph would simply call
for stronger locks for its class of commands. heap_inplace_update_scan()
acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
ExclusiveLock, on each tuple it overwrites.

pg_class heap_update() callers: before copying the tuple to modify, take a
lock that conflicts with at least one of those from the preceding paragraph.
SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
After heap_update(), release any LOCKTAG_TUPLE. Most of these callers opt
to acquire just the LOCKTAG_RELATION.

These rules seem complicated. Phrasing this slightly differently, if I
understand correctly: for a heap_update() caller, it's always sufficient
to hold LOCKTAG_TUPLE, but if you happen to hold some other lock on the
relation that conflicts with those mentioned in the first paragraph,
then you can skip the LOCKTAG_TUPLE lock.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
call heap_update() on pg_class or pg_database? That'd make the rule simple.

--
Heikki Linnakangas
Neon (https://neon.tech)

#55

noah@leadboat.com

over 1 year ago

In reply to: Heikki Linnakangas (#54)

Re: race condition in pg_class

Thanks for reviewing.

On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:

On 14/07/2024 20:48, Noah Misch wrote:

I've pushed the two patches for your reports. To placate cfbot, I'm attaching
the remaining patches.

inplace090-LOCKTAG_TUPLE-eoxact-v8.patch: Makes sense. A comment would be in
order, it looks pretty random as it is. Something like:

/*
* Tuple locks are currently only held for short durations within a
* transaction. Check that we didn't forget to release one.
*/

Will add.

inplace110-successors-v8.patch: Makes sense.

The README changes would be better as part of the third patch, as this patch
doesn't actually do any of the new locking described in the README, and it
fixes the "inplace update updates wrong tuple" bug even without those tuple
locks.

That should work. Will confirm.

+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
I wonder if the functions should be called "systable_*" and placed in
genam.c rather than in heapam.c. The interface looks more like the existing
systable functions. It feels like a modularity violation for a function in
heapam.c to take an argument like "indexId", and call back into systable_*
functions.

Yes, _scan() and _cancel() especially are wrappers around systable. Some API
options follow. Any preference or other ideas?

==== direct s/heap_/systable_/ rename

systable_inplace_update_scan([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update_finish(inplace_state, tup);
else
systable_inplace_update_cancel(inplace_state);

==== make the first and last steps more systable-like

systable_inplace_update_begin([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update(inplace_state, tup);
systable_inplace_update_end(inplace_state);

==== no systable_ wrapper for middle step, more like CatalogTupleUpdate

systable_inplace_update_begin([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
heap_inplace_update(relation,
systable_inplace_old_tuple(inplace_state),
tup,
systable_inplace_buffer(inplace_state));
systable_inplace_update_end(inplace_state);

/*----------
* XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
*
* ["D" is a VACUUM (ONLY_DATABASE_STATS)]
* ["R" is a VACUUM tbl]
* D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
* c * D: systable_getnext() returns pg_class tuple of tbl
* R: memcpy() into pg_class tuple of tbl
* D: raise pg_database.datfrozenxid, XLogInsert(), finish
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
*/

Hmm, that's a tight race, but feels bad to leave it unfixed. One approach
would be to modify the tuple on the buffer only after WAL-logging it. That
way, D cannot read the updated value before it has been WAL logged. Just
need to make sure that the change still gets included in the WAL record.
Maybe something like:

if (RelationNeedsWAL(relation))
{
/*
* Make a temporary copy of the page that includes the change, in
* case the a full-page image is logged
*/
PGAlignedBlock tmppage;

memcpy(tmppage.data, page, BLCKSZ);

/* copy the tuple to the temporary copy */
memcpy(...);

XLogRegisterBlock(0, ..., tmppage, REGBUF_STANDARD);
XLogInsert();
}

/* copy the tuple to the buffer */
memcpy(...);

Yes, that's the essence of
inplace180-datfrozenxid-overtakes-relfrozenxid-v1.patch from
/messages/by-id/flat/20240620012908.92.nmisch@google.com.

pg_class heap_inplace_update_scan() callers: before the call, acquire
LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
(VACUUM), or a mode with strictly more conflicts. If the update targets a
row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
on the table. Locking the index rel is optional. (This allows VACUUM to
overwrite per-index pg_class while holding a lock on the table alone.) We
could allow weaker locks, in which case the next paragraph would simply call
for stronger locks for its class of commands. heap_inplace_update_scan()
acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
ExclusiveLock, on each tuple it overwrites.

pg_class heap_update() callers: before copying the tuple to modify, take a
lock that conflicts with at least one of those from the preceding paragraph.
SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
After heap_update(), release any LOCKTAG_TUPLE. Most of these callers opt
to acquire just the LOCKTAG_RELATION.

These rules seem complicated. Phrasing this slightly differently, if I
understand correctly: for a heap_update() caller, it's always sufficient to
hold LOCKTAG_TUPLE, but if you happen to hold some other lock on the
relation that conflicts with those mentioned in the first paragraph, then
you can skip the LOCKTAG_TUPLE lock.

Yes.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
call heap_update() on pg_class or pg_database? That'd make the rule simple.

We could. That would change more code sites. Rough estimate:

$ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l
23

If the count were 2, I'd say let's simplify the rule like you're exploring.
(I originally had a complicated rule for pg_database, but I abandoned that
when it helped few code sites.) If it were 100, I'd say the complicated rule
is worth it. A count of 23 makes both choices fair.

Long-term, I hope relfrozenxid gets reimplemented with storage outside
pg_class, removing the need for inplace updates. So the additional 23 code
sites might change back at a future date. That shouldn't be a big
consideration, though.

Another option here would be to preface that README section with a simplified
view, something like, "If a warning brought you here, take a tuple lock. The
rest of this section is just for people needing to understand the conditions
for --enable-casserts emitting that warning." How about that instead of
simplifying the rules?

#56

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Noah Misch (#55)

Re: race condition in pg_class

On 17/08/2024 07:07, Noah Misch wrote:

On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:
On 14/07/2024 20:48, Noah Misch wrote:
+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *	heap_inplace_update_cancel(inplace_state);
I wonder if the functions should be called "systable_*" and placed in
genam.c rather than in heapam.c. The interface looks more like the existing
systable functions. It feels like a modularity violation for a function in
heapam.c to take an argument like "indexId", and call back into systable_*
functions.
Yes, _scan() and _cancel() especially are wrappers around systable. Some API
options follow. Any preference or other ideas?

==== direct s/heap_/systable_/ rename [option 1]

systable_inplace_update_scan([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update_finish(inplace_state, tup);
else
systable_inplace_update_cancel(inplace_state);

==== make the first and last steps more systable-like [option 2]

systable_inplace_update_begin([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update(inplace_state, tup);
systable_inplace_update_end(inplace_state);

==== no systable_ wrapper for middle step, more like CatalogTupleUpdate [option 3]

systable_inplace_update_begin([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
heap_inplace_update(relation,
systable_inplace_old_tuple(inplace_state),
tup,
systable_inplace_buffer(inplace_state));
systable_inplace_update_end(inplace_state);

My order of preference is: 2, 1, 3.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
call heap_update() on pg_class or pg_database? That'd make the rule simple.

We could. That would change more code sites. Rough estimate:

$ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l
23

If the count were 2, I'd say let's simplify the rule like you're exploring.
(I originally had a complicated rule for pg_database, but I abandoned that
when it helped few code sites.) If it were 100, I'd say the complicated rule
is worth it. A count of 23 makes both choices fair.

Ok.

How many of those for RELKIND_INDEX vs tables? I'm thinking if we should
always require a tuple lock on indexes, if that would make a difference.

Long-term, I hope relfrozenxid gets reimplemented with storage outside
pg_class, removing the need for inplace updates. So the additional 23 code
sites might change back at a future date. That shouldn't be a big
consideration, though.

Another option here would be to preface that README section with a simplified
view, something like, "If a warning brought you here, take a tuple lock. The
rest of this section is just for people needing to understand the conditions
for --enable-casserts emitting that warning." How about that instead of
simplifying the rules?

Works for me. Or perhaps the rules could just be explained more
succinctly. Something like:

-----
pg_class heap_inplace_update_scan() callers: before the call, acquire a
lock on the relation in mode ShareUpdateExclusiveLock or stricter. If
the update targets a row of RELKIND_INDEX (but not
RELKIND_PARTITIONED_INDEX), that lock must be on the table, locking the
index rel is not necessary. (This allows VACUUM to overwrite per-index
pg_class while holding a lock on the table alone.)
heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE in
InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it
overwrites.

pg_class heap_update() callers: before copying the tuple to modify, take
a lock on the tuple, or a ShareUpdateExclusiveLock or stricter on the
relation.

SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
Most heap_update() callers already hold a suitable lock on the relation
for other reasons, and can skip the tuple lock. If you do acquire the
tuple lock, release it immediately after the update.

pg_database: before copying the tuple to modify, all updaters of
pg_database rows acquire LOCKTAG_TUPLE. (Few updaters acquire
LOCKTAG_OBJECT on the database OID, so it wasn't worth extending that as
a second option.)
-----

--
Heikki Linnakangas
Neon (https://neon.tech)

#57

noah@leadboat.com

over 1 year ago

In reply to: Heikki Linnakangas (#56)

4 attachment(s)

Re: race condition in pg_class

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

On 17/08/2024 07:07, Noah Misch wrote:

On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:

I wonder if the functions should be called "systable_*" and placed in
genam.c rather than in heapam.c. The interface looks more like the existing
systable functions. It feels like a modularity violation for a function in
heapam.c to take an argument like "indexId", and call back into systable_*
functions.

Yes, _scan() and _cancel() especially are wrappers around systable. Some API
options follow. Any preference or other ideas?

==== direct s/heap_/systable_/ rename [option 1]

systable_inplace_update_scan([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update_finish(inplace_state, tup);
else
systable_inplace_update_cancel(inplace_state);

==== make the first and last steps more systable-like [option 2]

systable_inplace_update_begin([...], &tup, &inplace_state);
if (!HeapTupleIsValid(tup))
elog(ERROR, [...]);
... [buffer is exclusive-locked; mutate "tup"] ...
if (dirty)
systable_inplace_update(inplace_state, tup);
systable_inplace_update_end(inplace_state);

My order of preference is: 2, 1, 3.

I kept tuple locking responsibility in heapam.c. That's simpler and better
for modularity, but it does mean we release+acquire after any xmax wait.
Before, we avoided that if the next genam.c scan found the same TID. (If the
next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
are rare enough to justify simplifying as this version does. I don't expect
anyone to notice the starvation outside of tests built to show it. (With
previous versions, one can show it with a purpose-built test that commits
instead of aborting, like the "001_pgbench_grant@9" test.)

This move also loses the optimization of unpinning before XactLockTableWait().
heap_update() doesn't optimize that way, so that's fine.

The move ended up more like (1), though I did do
s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
felt that worked better than (2) to achieve lock release before
CacheInvalidateHeapTuple(). Alternatives that could be fine:

- In the cancel case, call both systable_inplace_update_cancel and
systable_inplace_update_end. _finish or _cancel would own unlock, while
_end would own systable_endscan().

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I made the other changes we discussed, also.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
call heap_update() on pg_class or pg_database? That'd make the rule simple.

We could. That would change more code sites. Rough estimate:

$ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l
23

How many of those for RELKIND_INDEX vs tables? I'm thinking if we should
always require a tuple lock on indexes, if that would make a difference.

Three sites. See attached inplace125 patch. Is it a net improvement? If so,
I'll squash it into inplace120.

Another option here would be to preface that README section with a simplified
view, something like, "If a warning brought you here, take a tuple lock. The
rest of this section is just for people needing to understand the conditions
for --enable-casserts emitting that warning." How about that instead of
simplifying the rules?

Works for me. Or perhaps the rules could just be explained more succinctly.
Something like:

-----

I largely used your text instead.

While doing these updates, I found an intra-grant-inplace.spec permutation
being flaky on inplace110 but stable on inplace120. That turned out not to be
v9-specific. As of patch v1, I now see it was already flaky (~5% failure
here). I've now added to inplace110 a minimal tweak to stabilize that spec,
which inplace120 removes.

Thanks,
nm

Attachments:

inplace090-LOCKTAG_TUPLE-eoxact-v9.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..e5e7ab5 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,16 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Tuple locks are currently held only for short durations within a
+		 * transaction. Check that we didn't forget to release one.
+		 */
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v9.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the systable_inplace_update_begin()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by Heikki Linnakangas and Alexander Lakhin.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..ddb2def 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,14 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+systable_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..24f7e62 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -63,7 +63,6 @@
 #include "storage/procarray.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
-#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
@@ -6041,61 +6040,166 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_lock - protect inplace update from concurrent heap_update()
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Evaluate whether the tuple's state is compatible with a no-key update.
+ * Current transaction rowmarks are fine, as is KEY SHARE from any
+ * transaction.  If compatible, return true with the buffer exclusive-locked,
+ * and the caller must release that by calling heap_inplace_update(), calling
+ * heap_inplace_unlock(), or raising an error.  Otherwise, return false after
+ * blocking transactions, if any, have ended.
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this doesn't guarantee any particular predicate locking.
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would imply either (a) mutating all tuples
+ * in one critical section or (b) accepting a chance of partial completion.
+ * Partial completion of a relfrozenxid update would have the weird
+ * consequence that the table's next VACUUM could see the table's relfrozenxid
+ * move forward between vacuum_get_cutoffs() and finishing.
+ */
+bool
+heap_inplace_lock(Relation relation,
+				  HeapTuple oldtup_ptr, Buffer buffer)
+{
+	HeapTupleData oldtup = *oldtup_ptr; /* minimize diff vs. heap_update() */
+	TM_Result	result;
+	bool		ret;
+
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to ensure a later
+	 * attempt has a fair chance.  While we don't need this if xwait aborted,
+	 * don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
+/*
+ * heap_inplace_update - subroutine of systable_inplace_update_finish
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update(Relation relation,
+					HeapTuple oldtup, HeapTuple tuple,
+					Buffer buffer)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
+	HeapTupleHeader htup = oldtup->t_data;
 	uint32		oldlen;
 	uint32		newlen;
 
-	/*
-	 * For now, we don't allow parallel updates.  Unlike a regular update,
-	 * this should never create a combo CID, so it might be possible to relax
-	 * this restriction, but not without more thought and testing.  It's not
-	 * clear that it would be useful, anyway.
-	 */
-	if (IsInParallelMode())
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
-				 errmsg("cannot update tuples during a parallel operation")));
-
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
-
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
-
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
-
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
-
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6211,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6244,35 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	heap_inplace_unlock(relation, oldtup, buffer);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_unlock - reverse of heap_inplace_lock
+ */
+void
+heap_inplace_unlock(Relation relation,
+					HeapTuple oldtup, Buffer buffer)
+{
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 43c95d6..5f55e8c 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/transam.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
+#include "utils/injection_point.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
@@ -747,3 +749,140 @@ systable_endscan_ordered(SysScanDesc sysscan)
 		UnregisterSnapshot(sysscan->snapshot);
 	pfree(sysscan);
 }
+
+/*
+ * systable_inplace_update_begin --- update a row "in place" (overwrite it)
+ *
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  Standard flow:
+ *
+ * ... [any slow preparation not requiring oldtup] ...
+ * systable_inplace_update_begin([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	systable_inplace_update_finish(inplace_state, tup);
+ * else
+ *	systable_inplace_update_cancel(inplace_state);
+ *
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to systable_inplace_update_finish() or
+ * systable_inplace_update_cancel().
+ */
+void
+systable_inplace_update_begin(Relation relation,
+							  Oid indexId,
+							  bool indexOK,
+							  Snapshot snapshot,
+							  int nkeys, const ScanKeyData *key,
+							  HeapTuple *oldtupcopy,
+							  void **state)
+{
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
+
+	/*
+	 * For now, we don't allow parallel updates.  Unlike a regular update,
+	 * this should never create a combo CID, so it might be possible to relax
+	 * this restriction, but not without more thought and testing.  It's not
+	 * clear that it would be useful, anyway.
+	 */
+	if (IsInParallelMode())
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+				 errmsg("cannot update tuples during a parallel operation")));
+
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
+
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
+
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+		BufferHeapTupleTableSlot *bslot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
+
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+		slot = scan->slot;
+		Assert(TTS_IS_BUFFERTUPLE(slot));
+		bslot = (BufferHeapTupleTableSlot *) slot;
+		if (heap_inplace_lock(scan->heap_rel,
+							  bslot->base.tuple, bslot->buffer))
+			break;
+		systable_endscan(scan);
+	};
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * systable_inplace_update_finish --- second phase of inplace update
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+systable_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	Relation	relation = scan->heap_rel;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+
+	heap_inplace_update(relation, oldtup, tuple, buffer);
+	systable_endscan(scan);
+}
+
+/*
+ * systable_inplace_update_cancel --- abandon inplace update
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+systable_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	Relation	relation = scan->heap_rel;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+
+	heap_inplace_unlock(relation, oldtup, buffer);
+	systable_endscan(scan);
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3375905..e4608b9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2785,7 +2785,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2819,33 +2821,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	systable_inplace_update_begin(pg_class, ClassOidIndexId, true, NULL,
+								  1, key, &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2908,11 +2889,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		systable_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		systable_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..ad3082c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		systable_inplace_update_begin(class_rel, ClassOidIndexId, true,
+									  NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		systable_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index d00ae40..86a08d7 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1651,7 +1651,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	Relation	pgdbrel;
 	HeapTuple	tup;
 	ScanKeyData scankey;
-	SysScanDesc scan;
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1790,24 +1790,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgstat_drop_database(db_id);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
-	 */
-	ScanKeyInit(&scankey,
-				Anum_pg_database_datname,
-				BTEqualStrategyNumber, F_NAMEEQ,
-				CStringGetDatum(dbname));
-
-	scan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
-							  NULL, 1, &scankey);
-
-	tup = systable_getnext(scan);
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
-	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
 	 * buffers). But we might crash or get interrupted below. To prevent
@@ -1818,8 +1800,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&scankey,
+				Anum_pg_database_datname,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(dbname));
+	systable_inplace_update_begin(pgdbrel, DatabaseNameIndexId, true,
+								  NULL, 1, &scankey, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	systable_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1827,8 +1818,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
-
-	systable_endscan(scan);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..55baf10 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			systable_inplace_update_begin(pg_db, DatabaseOidIndexId, true,
+										  NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				systable_inplace_update_finish(state, tuple);
 			}
+			else
+				systable_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7d8e9d2..9304b8c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1402,7 +1402,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1413,7 +1415,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	systable_inplace_update_begin(rd, ClassOidIndexId, true,
+								  NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1521,7 +1528,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		systable_inplace_update_finish(inplace_state, ctup);
+	else
+		systable_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1573,6 +1582,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1696,20 +1706,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	systable_inplace_update_begin(relation, DatabaseOidIndexId, true,
+								  NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1743,7 +1751,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		systable_inplace_update_finish(inplace_state, tuple);
+	else
+		systable_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index fdcfbe8..c25f5d1 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -233,5 +233,14 @@ extern SysScanDesc systable_beginscan_ordered(Relation heapRelation,
 extern HeapTuple systable_getnext_ordered(SysScanDesc sysscan,
 										  ScanDirection direction);
 extern void systable_endscan_ordered(SysScanDesc sysscan);
+extern void systable_inplace_update_begin(Relation relation,
+										  Oid indexId,
+										  bool indexOK,
+										  Snapshot snapshot,
+										  int nkeys, const ScanKeyData *key,
+										  HeapTuple *oldtupcopy,
+										  void **state);
+extern void systable_inplace_update_finish(void *state, HeapTuple tuple);
+extern void systable_inplace_update_cancel(void *state);
 
 #endif							/* GENAM_H */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..85ad32a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,13 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern bool heap_inplace_lock(Relation relation,
+							  HeapTuple oldtup_ptr, Buffer buffer);
+extern void heap_inplace_update(Relation relation,
+								HeapTuple oldtup, HeapTuple tuple,
+								Buffer buffer);
+extern void heap_inplace_unlock(Relation relation,
+								HeapTuple oldtup, Buffer buffer);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..fe26984 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -98,7 +100,7 @@ f
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
 step c2: COMMIT;
 
-starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+starting permutation: b3 sfu3 b1 grant1 read2 as3 addk2 r3 c1 read2
 step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
 step sfu3: 
 	SELECT relhasindex FROM pg_class
@@ -122,17 +124,19 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step as3: LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..d07ed3b 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -48,6 +48,7 @@ step sfu3	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
 }
+step as3	{ LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE; }
 step r3	{ ROLLBACK; }
 
 # Additional heap_update()
@@ -73,7 +74,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
@@ -117,6 +118,7 @@ permutation
 	b1
 	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
 	read2
+	as3			# XXX temporary until patch adds locking to addk2
 	addk2(c1)	# block in LockTuple() behind grant1
 	r3			# unblock grant1; addk2 now awaits grant1 xmax
 	c1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v9.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Make heap_update() callers wait for inplace update.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index ddb2def..95828ce 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -154,6 +154,48 @@ The following infomask bits are applicable:
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
 
+Locking to write inplace-updated tables
+---------------------------------------
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives heap_inplace_update_scan() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class heap_inplace_update_scan() callers: before the call, acquire a lock
+  on the relation in mode ShareUpdateExclusiveLock or stricter.  If the update
+  targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that
+  lock must be on the table.  Locking the index rel is not necessary.  (This
+  allows VACUUM to overwrite per-index pg_class while holding a lock on the
+  table alone.) heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE
+  in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it
+  overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock on the tuple, a ShareUpdateExclusiveLock on the relation, or a
+  ShareRowExclusiveLock or stricter on the relation.
+
+  SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
+  Most heap_update() callers already hold a suitable lock on the relation for
+  other reasons and can skip the tuple lock.  If you do acquire the tuple
+  lock, release it immediately after the update.
+
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
 Reading inplace-updated columns
 -------------------------------
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 24f7e62..7de60c1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -75,6 +77,12 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -121,6 +129,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3207,6 +3217,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4071,6 +4085,128 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6087,15 +6223,21 @@ heap_inplace_lock(Relation relation,
 	TM_Result	result;
 	bool		ret;
 
+#ifdef USE_ASSERT_CHECKING
+	if (RelationGetRelid(relation) == RelationRelationId)
+		check_inplace_rel_lock(oldtup_ptr);
+#endif
+
 	Assert(BufferIsValid(buffer));
 
+	LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/*----------
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - already locked tuple above, since inplace needs that unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
@@ -6179,7 +6321,10 @@ heap_inplace_lock(Relation relation,
 	 * don't bother optimizing that.
 	 */
 	if (!ret)
+	{
+		UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
 		InvalidateCatalogSnapshot();
+	}
 	return ret;
 }
 
@@ -6188,6 +6333,8 @@ heap_inplace_lock(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update(Relation relation,
@@ -6271,6 +6418,7 @@ heap_inplace_unlock(Relation relation,
 					HeapTuple oldtup, Buffer buffer)
 {
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 }
 
 #define		FRM_NOOP				0x0001
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5f55e8c..a3f2dc2 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -755,7 +755,9 @@ systable_endscan_ordered(SysScanDesc sysscan)
  *
  * Overwriting violates both MVCC and transactional safety, so the uses of
  * this function in Postgres are extremely limited.  Nonetheless we find some
- * places to use it.  Standard flow:
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
  * ... [any slow preparation not requiring oldtup] ...
  * systable_inplace_update_begin([...], &tup, &inplace_state);
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 86a08d7..2987ce9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1877,6 +1877,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1948,11 +1949,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2201,6 +2204,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2209,6 +2213,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2439,6 +2444,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2488,6 +2494,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2537,6 +2544,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2565,6 +2573,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2573,14 +2582,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2692,6 +2702,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2713,6 +2725,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 55baf10..05a6de6 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c5a56c7..6b22a88 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4413,14 +4413,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 52ce6b0..03278f6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3590,6 +3590,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3598,9 +3599,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3608,7 +3610,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4102,6 +4105,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4124,7 +4128,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4151,7 +4156,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14926,7 +14932,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -15030,6 +15036,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17179,7 +17186,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17198,6 +17206,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17210,7 +17220,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17222,6 +17234,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186f..f880f90 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1204,6 +1204,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 1086cbc..54025c9 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -661,8 +661,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80..1161520 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2324,6 +2324,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2332,6 +2334,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2425,6 +2428,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2529,6 +2540,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2854,6 +2873,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2890,14 +2910,32 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
+	{
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	}
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even for CMD_DELETE, for CMD_NOTHING, and for tuples
+			 * that don't match mas_whenqual.  MERGE on system catalogs is a
+			 * minor use case, so don't bother optimizing those.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2969,7 +3007,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2980,11 +3018,11 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecUpdateAct(context, resultRelInfo, tupleid,
@@ -3003,7 +3041,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3021,7 +3060,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3032,11 +3071,11 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3118,7 +3157,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3192,7 +3231,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3210,6 +3249,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3238,7 +3286,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3266,13 +3314,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3319,6 +3367,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3770,6 +3822,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4082,6 +4135,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4093,6 +4148,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4101,6 +4157,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4112,6 +4173,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66ed24e..5abb97c 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3768,6 +3768,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3810,11 +3811,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3934,9 +3936,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index af7d8fd..b078a6e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index fe26984..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -100,7 +100,7 @@ f
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
 step c2: COMMIT;
 
-starting permutation: b3 sfu3 b1 grant1 read2 as3 addk2 r3 c1 read2
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
 step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
 step sfu3: 
 	SELECT relhasindex FROM pg_class
@@ -124,7 +124,6 @@ relhasindex
 f          
 (1 row)
 
-step as3: LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE;
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
@@ -155,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -195,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -224,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index d07ed3b..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -48,7 +50,6 @@ step sfu3	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
 }
-step as3	{ LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE; }
 step r3	{ ROLLBACK; }
 
 # Additional heap_update()
@@ -74,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -118,7 +117,6 @@ permutation
 	b1
 	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
 	read2
-	as3			# XXX temporary until patch adds locking to addk2
 	addk2(c1)	# block in LockTuple() behind grant1
 	r3			# unblock grant1; addk2 now awaits grant1 xmax
 	c1
@@ -128,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -140,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

inplace125-no-exception-for-indexes-v9.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Require tuple locks for heap_update() of RELKIND_INDEX pg_class rows.
    
    [To be squashed into inplace120-locktag if we're keeping it]

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 95828ce..b69a0e4 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -168,18 +168,17 @@ using locktags as follows.  While DDL code is the main audience, the executor
 follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
 are per-catalog:
 
-  pg_class heap_inplace_update_scan() callers: before the call, acquire a lock
-  on the relation in mode ShareUpdateExclusiveLock or stricter.  If the update
-  targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that
-  lock must be on the table.  Locking the index rel is not necessary.  (This
-  allows VACUUM to overwrite per-index pg_class while holding a lock on the
-  table alone.) heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE
-  in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it
-  overwrites.
+  pg_class heap_inplace_update_scan() callers: if the pg_class row pertains to
+  an index (but not RELKIND_PARTITIONED_INDEX), no lock is required.
+  Otherwise, before the call, acquire a lock on the relation in mode
+  ShareUpdateExclusiveLock or stricter.  heap_inplace_update_scan() acquires
+  and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
 
-  pg_class heap_update() callers: before copying the tuple to modify, take a
-  lock on the tuple, a ShareUpdateExclusiveLock on the relation, or a
-  ShareRowExclusiveLock or stricter on the relation.
+  pg_class heap_update() callers: acquire a lock before copying the tuple to
+  modify.  If the pg_class row pertains to an index, lock the tuple.
+  Otherwise, lock the tuple, get a ShareUpdateExclusiveLock on the relation,
+  or get a ShareRowExclusiveLock or stricter on the relation.
 
   SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
   Most heap_update() callers already hold a suitable lock on the relation for
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7de60c1..1e8bd02 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4131,19 +4131,10 @@ check_lock_if_inplace_updateable_rel(Relation relation,
 					dbid = InvalidOid;
 				else
 					dbid = MyDatabaseId;
-
-				if (classForm->relkind == RELKIND_INDEX)
-				{
-					Relation	irel = index_open(relid, AccessShareLock);
-
-					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
-					index_close(irel, AccessShareLock);
-				}
-				else
-					SET_LOCKTAG_RELATION(tag, dbid, relid);
-
-				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
-					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+				SET_LOCKTAG_RELATION(tag, dbid, relid);
+				if (classForm->relkind == RELKIND_INDEX ||
+					(!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					 !LockHeldByMe(&tag, ShareRowExclusiveLock, true)))
 					elog(WARNING,
 						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
 						 NameStr(classForm->relname),
@@ -4181,21 +4172,14 @@ check_inplace_rel_lock(HeapTuple oldtup)
 	Oid			dbid;
 	LOCKTAG		tag;
 
+	if (classForm->relkind == RELKIND_INDEX)
+		return;
+
 	if (IsSharedRelation(relid))
 		dbid = InvalidOid;
 	else
 		dbid = MyDatabaseId;
-
-	if (classForm->relkind == RELKIND_INDEX)
-	{
-		Relation	irel = index_open(relid, AccessShareLock);
-
-		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
-		index_close(irel, AccessShareLock);
-	}
-	else
-		SET_LOCKTAG_RELATION(tag, dbid, relid);
-
+	SET_LOCKTAG_RELATION(tag, dbid, relid);
 	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
 		elog(WARNING,
 			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e4608b9..579cc0d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1558,6 +1558,8 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 				newClassTuple;
 	Form_pg_class oldClassForm,
 				newClassForm;
+	ItemPointerData oldClassTid,
+				newClassTid;
 	HeapTuple	oldIndexTuple,
 				newIndexTuple;
 	Form_pg_index oldIndexForm,
@@ -1569,6 +1571,10 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 
 	/*
 	 * Take a necessary lock on the old and new index before swapping them.
+	 * Since the caller holds session-level locks, this shouldn't deadlock.
+	 * The tuple locks come next, and deadlock is possible there.  There's no
+	 * good use case for altering the temporary index of a REINDEX
+	 * CONCURRENTLY, so don't put effort into avoiding said deadlock.
 	 */
 	oldClassRel = relation_open(oldIndexId, ShareUpdateExclusiveLock);
 	newClassRel = relation_open(newIndexId, ShareUpdateExclusiveLock);
@@ -1576,15 +1582,17 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	/* Now swap names and dependencies of those indexes */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	oldClassTuple = SearchSysCacheCopy1(RELOID,
-										ObjectIdGetDatum(oldIndexId));
+	oldClassTuple = SearchSysCacheLockedCopy1(RELOID,
+											  ObjectIdGetDatum(oldIndexId));
 	if (!HeapTupleIsValid(oldClassTuple))
 		elog(ERROR, "could not find tuple for relation %u", oldIndexId);
-	newClassTuple = SearchSysCacheCopy1(RELOID,
-										ObjectIdGetDatum(newIndexId));
+	newClassTuple = SearchSysCacheLockedCopy1(RELOID,
+											  ObjectIdGetDatum(newIndexId));
 	if (!HeapTupleIsValid(newClassTuple))
 		elog(ERROR, "could not find tuple for relation %u", newIndexId);
 
+	oldClassTid = oldClassTuple->t_self;
+	newClassTid = newClassTuple->t_self;
 	oldClassForm = (Form_pg_class) GETSTRUCT(oldClassTuple);
 	newClassForm = (Form_pg_class) GETSTRUCT(newClassTuple);
 
@@ -1597,8 +1605,10 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	newClassForm->relispartition = oldClassForm->relispartition;
 	oldClassForm->relispartition = isPartition;
 
-	CatalogTupleUpdate(pg_class, &oldClassTuple->t_self, oldClassTuple);
-	CatalogTupleUpdate(pg_class, &newClassTuple->t_self, newClassTuple);
+	CatalogTupleUpdate(pg_class, &oldClassTid, oldClassTuple);
+	UnlockTuple(pg_class, &oldClassTid, InplaceUpdateTupleLock);
+	CatalogTupleUpdate(pg_class, &newClassTid, newClassTuple);
+	UnlockTuple(pg_class, &newClassTid, InplaceUpdateTupleLock);
 
 	heap_freetuple(oldClassTuple);
 	heap_freetuple(newClassTuple);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 78f9678..402dc49 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1074,20 +1074,28 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 				relfilenumber2;
 	RelFileNumber swaptemp;
 	char		swptmpchr;
+	ItemPointerData otid1,
+				otid2;
 	Oid			relam1,
 				relam2;
 
-	/* We need writable copies of both pg_class tuples. */
+	/*
+	 * We need writable copies of both pg_class tuples.  Since r2 is new in
+	 * this transaction, no other process should be getting the tuple lock for
+	 * that one.  Hence, order of tuple lock acquisition doesn't matter.
+	 */
 	relRelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup1 = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(r1));
+	reltup1 = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(r1));
 	if (!HeapTupleIsValid(reltup1))
 		elog(ERROR, "cache lookup failed for relation %u", r1);
+	otid1 = reltup1->t_self;
 	relform1 = (Form_pg_class) GETSTRUCT(reltup1);
 
-	reltup2 = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(r2));
+	reltup2 = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(r2));
 	if (!HeapTupleIsValid(reltup2))
 		elog(ERROR, "cache lookup failed for relation %u", r2);
+	otid2 = reltup2->t_self;
 	relform2 = (Form_pg_class) GETSTRUCT(reltup2);
 
 	relfilenumber1 = relform1->relfilenode;
@@ -1252,10 +1260,8 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		CatalogIndexState indstate;
 
 		indstate = CatalogOpenIndexes(relRelation);
-		CatalogTupleUpdateWithInfo(relRelation, &reltup1->t_self, reltup1,
-								   indstate);
-		CatalogTupleUpdateWithInfo(relRelation, &reltup2->t_self, reltup2,
-								   indstate);
+		CatalogTupleUpdateWithInfo(relRelation, &otid1, reltup1, indstate);
+		CatalogTupleUpdateWithInfo(relRelation, &otid2, reltup2, indstate);
 		CatalogCloseIndexes(indstate);
 	}
 	else
@@ -1264,6 +1270,8 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		CacheInvalidateRelcacheByTuple(reltup1);
 		CacheInvalidateRelcacheByTuple(reltup2);
 	}
+	UnlockTuple(relRelation, &otid1, InplaceUpdateTupleLock);
+	UnlockTuple(relRelation, &otid2, InplaceUpdateTupleLock);
 
 	/*
 	 * Now that pg_class has been updated with its relevant information for
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 03278f6..86956aa 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14353,7 +14353,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 	/* Get its pg_class tuple, too */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relationOid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relationOid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relationOid);
 	tuple_class = (Form_pg_class) GETSTRUCT(tuple);
@@ -14443,7 +14443,9 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 	 * If the new owner is the same as the existing owner, consider the
 	 * command to have succeeded.  This is for dump restoration purposes.
 	 */
-	if (tuple_class->relowner != newOwnerId)
+	if (tuple_class->relowner == newOwnerId)
+		UnlockTuple(class_rel, &tuple->t_self, InplaceUpdateTupleLock);
+	else
 	{
 		Datum		repl_val[Natts_pg_class];
 		bool		repl_null[Natts_pg_class];
@@ -14503,6 +14505,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(class_rel), repl_val, repl_null, repl_repl);
 
 		CatalogTupleUpdate(class_rel, &newtuple->t_self, newtuple);
+		UnlockTuple(class_rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);

#58

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#57)

Re: race condition in pg_class

On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

My order of preference is: 2, 1, 3.

I kept tuple locking responsibility in heapam.c. That's simpler and better
for modularity, but it does mean we release+acquire after any xmax wait.
Before, we avoided that if the next genam.c scan found the same TID. (If the
next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
are rare enough to justify simplifying as this version does. I don't expect
anyone to notice the starvation outside of tests built to show it. (With
previous versions, one can show it with a purpose-built test that commits
instead of aborting, like the "001_pgbench_grant@9" test.)

This move also loses the optimization of unpinning before XactLockTableWait().
heap_update() doesn't optimize that way, so that's fine.

The move ended up more like (1), though I did do
s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
felt that worked better than (2) to achieve lock release before
CacheInvalidateHeapTuple(). Alternatives that could be fine:

From a consistency point of view, I find it cleaner if we can have all
the heap_inplace_lock and heap_inplace_unlock in the same set of
functions. So here those would be the systable_inplace_* functions.

- In the cancel case, call both systable_inplace_update_cancel and
systable_inplace_update_end. _finish or _cancel would own unlock, while
_end would own systable_endscan().

What happens to CacheInvalidateHeapTuple() in this approach? I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

Also I have a small question from inplace120.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another
write to ri_RelationDesc in CatalogOpenIndexes in
src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
be set there also to keep it consistent with ri_RelationDesc. Please
let me know if I am missing something about the usage of the new
field.

Thanks & Regards,
Nitin Motiani
Google

#59

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#58)

Re: race condition in pg_class

On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:

On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

My order of preference is: 2, 1, 3.

I kept tuple locking responsibility in heapam.c. That's simpler and better
for modularity, but it does mean we release+acquire after any xmax wait.
Before, we avoided that if the next genam.c scan found the same TID. (If the
next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
are rare enough to justify simplifying as this version does. I don't expect
anyone to notice the starvation outside of tests built to show it. (With
previous versions, one can show it with a purpose-built test that commits
instead of aborting, like the "001_pgbench_grant@9" test.)

This move also loses the optimization of unpinning before XactLockTableWait().
heap_update() doesn't optimize that way, so that's fine.

The move ended up more like (1), though I did do
s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
felt that worked better than (2) to achieve lock release before
CacheInvalidateHeapTuple(). Alternatives that could be fine:

From a consistency point of view, I find it cleaner if we can have all
the heap_inplace_lock and heap_inplace_unlock in the same set of
functions. So here those would be the systable_inplace_* functions.

That will technically be the case after inplace160, and I could make it so
here by inlining heap_inplace_unlock() into its heapam.c caller. Would that
be cleaner or less clean?

- In the cancel case, call both systable_inplace_update_cancel and
systable_inplace_update_end. _finish or _cancel would own unlock, while
_end would own systable_endscan().

What happens to CacheInvalidateHeapTuple() in this approach? I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

Cancel scenarios don't do invalidation. (Same under other alternatives.)

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update
PreInplace_Inval
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

If we hoisted all of invalidation up to the genam.c layer, a critical section
that starts in heapam.c would end in genam.c:

genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

If we didn't accept splitting the critical section but did accept splitting
invalidation responsibilities, one gets perhaps:

genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

That's how I ended up at inplace120 v9's design.

Also I have a small question from inplace120.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another

No, not a big advantage. I felt it was more in line with the typical style of
src/backend/executor.

write to ri_RelationDesc in CatalogOpenIndexes in
src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
be set there also to keep it consistent with ri_RelationDesc. Please
let me know if I am missing something about the usage of the new
field.

Can you say more about consequences you found?

Only the full executor reads the field, doing so when it fetches the most
recent version of a row. CatalogOpenIndexes() callers lack the full
executor's practice of fetching the most recent version of a row, so they
couldn't benefit reading the field.

I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the
full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in
part to keep it that way. Since CatalogOpenIndexes() skips ri_TrigDesc and
other fields, I would expect other malfunctions if some caller tried.

Thanks,
nm

#60

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#59)

Re: race condition in pg_class

On Sat, Aug 31, 2024 at 6:40 AM Noah Misch <noah@leadboat.com> wrote:

On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:

On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

My order of preference is: 2, 1, 3.

I kept tuple locking responsibility in heapam.c. That's simpler and better
for modularity, but it does mean we release+acquire after any xmax wait.
Before, we avoided that if the next genam.c scan found the same TID. (If the
next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
are rare enough to justify simplifying as this version does. I don't expect
anyone to notice the starvation outside of tests built to show it. (With
previous versions, one can show it with a purpose-built test that commits
instead of aborting, like the "001_pgbench_grant@9" test.)

This move also loses the optimization of unpinning before XactLockTableWait().
heap_update() doesn't optimize that way, so that's fine.

The move ended up more like (1), though I did do
s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
felt that worked better than (2) to achieve lock release before
CacheInvalidateHeapTuple(). Alternatives that could be fine:

From a consistency point of view, I find it cleaner if we can have all
the heap_inplace_lock and heap_inplace_unlock in the same set of
functions. So here those would be the systable_inplace_* functions.

That will technically be the case after inplace160, and I could make it so
here by inlining heap_inplace_unlock() into its heapam.c caller. Would that
be cleaner or less clean?

I am not sure. It seems more inconsistent to take the lock using
heap_inplace_lock but then just unlock by calling LockBuffer. On the
other hand, it doesn't seem that different from the way
SearchSysCacheLocked1 and UnlockTuple are used in inplace120. If we
are doing it this way, perhaps it would be good to rename
heap_inplace_update to heap_inplace_update_and_unlock.

- In the cancel case, call both systable_inplace_update_cancel and
systable_inplace_update_end. _finish or _cancel would own unlock, while
_end would own systable_endscan().

What happens to CacheInvalidateHeapTuple() in this approach? I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

Cancel scenarios don't do invalidation. (Same under other alternatives.)

Sorry, I wasn't clear about this one. Let me rephrase. My
understanding is that the code in this approach would look like below
:
if (dirty)
systable_inplace_update_finish(inplace_state, tup);
else
systable_inplace_update_cancel(inplace_state);
systable_inplace_update_end(inplace_state);

And that in this structure, both _finish and _cancel will call
heap_inplace_unlock and then _end will call systable_endscan. So even
with this structure, the invalidation has to happen inside _finish
after the unlock. So this also pulls the invalidation to the genam.c
layer. Am I understanding this correctly?

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update
PreInplace_Inval
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

If we hoisted all of invalidation up to the genam.c layer, a critical section
that starts in heapam.c would end in genam.c:

genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

If we didn't accept splitting the critical section but did accept splitting
invalidation responsibilities, one gets perhaps:

genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

How about this alternative?

genam.c:systable_inplace_update_finish
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

Looking at inplace160, it seems that the start of the critical section
is right after PreInplace_Inval. So why not pull START_CRIT_SECTION
and END_CRIT_SECTION out to the genam.c layer? Alternatively since
heap_inplace_update is commented as a subroutine of
systable_inplace_update_finish, should everything just be moved to the
genam.c layer? Although it looks like you already considered and
rejected this approach. So just pulling out the critical sections
start and end is fine. Am I missing something here?

If the above alternatives are not possible, it's probably fine to go
ahead with the current patch with the function renamed to
heap_inplace_update_and_unlock (or something similar) as mentioned
earlier?

That's how I ended up at inplace120 v9's design.

Also I have a small question from inplace120.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another

No, not a big advantage. I felt it was more in line with the typical style of
src/backend/executor.

Thanks for the clarification. For ri_TrigDesc, I see the following
comment in execMain.c :

/* make a copy so as not to depend on relcache info not changing... */
resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);

So in this case I see more value in having a separate field compared
to the bool field for ri_needLockTagTuple.

write to ri_RelationDesc in CatalogOpenIndexes in
src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
be set there also to keep it consistent with ri_RelationDesc. Please
let me know if I am missing something about the usage of the new
field.

Can you say more about consequences you found?

My apologies that I wasn't clear. I haven't found any consequences. I
just find it a smell that there are two fields which are not
independent and they go out of sync. And that's why my preference is
to not have a dependent field unless there is a specific advantage.

Only the full executor reads the field, doing so when it fetches the most
recent version of a row. CatalogOpenIndexes() callers lack the full
executor's practice of fetching the most recent version of a row, so they
couldn't benefit reading the field.

I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the
full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in
part to keep it that way. Since CatalogOpenIndexes() skips ri_TrigDesc and
other fields, I would expect other malfunctions if some caller tried.

Sorry, I missed the typedef. Thanks for pointing that out. I agree
that the likelihood of any malfunction is low. But even for the
ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't
ri_needLockTagTuple also be set to a default value of false? My
preference would be not to have a separate bool field to avoid
thinking about these scenarios.

Thanks & Regards
Nitin Motiani
Google

#61

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#60)

Re: race condition in pg_class

On Tue, Sep 03, 2024 at 09:24:52PM +0530, Nitin Motiani wrote:

On Sat, Aug 31, 2024 at 6:40 AM Noah Misch <noah@leadboat.com> wrote:

On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:

On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah@leadboat.com> wrote:

- In the cancel case, call both systable_inplace_update_cancel and
systable_inplace_update_end. _finish or _cancel would own unlock, while
_end would own systable_endscan().

What happens to CacheInvalidateHeapTuple() in this approach? I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

understanding is that the code in this approach would look like below
:
if (dirty)
systable_inplace_update_finish(inplace_state, tup);
else
systable_inplace_update_cancel(inplace_state);
systable_inplace_update_end(inplace_state);

And that in this structure, both _finish and _cancel will call
heap_inplace_unlock and then _end will call systable_endscan. So even
with this structure, the invalidation has to happen inside _finish
after the unlock.

Right.

So this also pulls the invalidation to the genam.c
layer. Am I understanding this correctly?

Compared to the v9 patch, the "call both" alternative would just move the
systable_endscan() call to a new systable_inplace_update_end(). It wouldn't
move anything across the genam:heapam boundary.
systable_inplace_update_finish() would remain a thin wrapper around a heapam
function.

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

How about this alternative?

genam.c:systable_inplace_update_finish
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

Looking at inplace160, it seems that the start of the critical section
is right after PreInplace_Inval. So why not pull START_CRIT_SECTION
and END_CRIT_SECTION out to the genam.c layer?

heap_inplace_update() has an elog(ERROR) that needs to happen outside any
critical section. Since the condition for that elog deals with tuple header
internals, it belongs at the heapam layer more than the systable layer.

Alternatively since
heap_inplace_update is commented as a subroutine of
systable_inplace_update_finish, should everything just be moved to the
genam.c layer? Although it looks like you already considered and
rejected this approach.

Calling XLogInsert(RM_HEAP_ID) in genam.c would be a worse modularity
violation than the one that led to the changes between v8 and v9. I think
even calling CacheInvalidateHeapTuple() in genam.c would be a worse modularity
violation than the one attributed to v8. Modularity would have the
heap_inplace function resemble heap_update() handling of invals.

If the above alternatives are not possible, it's probably fine to go
ahead with the current patch with the function renamed to
heap_inplace_update_and_unlock (or something similar) as mentioned
earlier?

I like that name. The next version will use it.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another

No, not a big advantage. I felt it was more in line with the typical style of
src/backend/executor.

just find it a smell that there are two fields which are not
independent and they go out of sync. And that's why my preference is
to not have a dependent field unless there is a specific advantage.

Got it. This check happens for every tuple of every UPDATE, so performance
may be a factor. Some designs and their merits:

==== a. ri_needLockTagTuple
Performance: best: check one value for nonzero
Drawback: one more value lifecycle to understand
Drawback: users of ResultRelInfo w/o InitResultRelInfo() could miss this

==== b. call IsInplaceUpdateRelation
Performance: worst: two extern function calls, then compare against two values

==== c. make IsInplaceUpdateRelation() and IsInplaceUpdateOid() inline, and call
Performance: high: compare against two values
Drawback: unlike catalog.c peers
Drawback: extensions that call these must recompile if these change

==== d. add IsInplaceUpdateRelationInline() and IsInplaceUpdateOidInline(), and call
Performance: high: compare against two values
Drawback: more symbols to understand
Drawback: extensions might call these, reaching the drawback of (c)

I think my preference order is (a), (c), (d), (b). How do you see it?

But even for the
ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't
ri_needLockTagTuple also be set to a default value of false?

CatalogOpenIndexes() explicitly zero-initializes two fields and relies on
makeNode() zeroing for dozens of others. Hence, omitting the initialization
fits the function's local convention better than including it. (PostgreSQL
has no policy or dominant practice about redundant zero-initialization.)

#62

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#61)

Re: race condition in pg_class

On Wed, Sep 4, 2024 at 2:53 AM Noah Misch <noah@leadboat.com> wrote:

So this also pulls the invalidation to the genam.c
layer. Am I understanding this correctly?

Compared to the v9 patch, the "call both" alternative would just move the
systable_endscan() call to a new systable_inplace_update_end(). It wouldn't
move anything across the genam:heapam boundary.
systable_inplace_update_finish() would remain a thin wrapper around a heapam
function.

Thanks for the clarification.

- Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
tolerable now, this gets less attractive after the inplace160 patch from
/messages/by-id/flat/20240523000548.58.nmisch@google.com

I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

How about this alternative?

genam.c:systable_inplace_update_finish
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

Looking at inplace160, it seems that the start of the critical section
is right after PreInplace_Inval. So why not pull START_CRIT_SECTION
and END_CRIT_SECTION out to the genam.c layer?

heap_inplace_update() has an elog(ERROR) that needs to happen outside any
critical section. Since the condition for that elog deals with tuple header
internals, it belongs at the heapam layer more than the systable layer.

Understood. How about this alternative then? The tuple length check
and the elog(ERROR) gets its own function. Something like
heap_inplace_update_validate or
heap_inplace_update_validate_tuple_length. So in that case, it would
look like this :

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update_validate/heap_inplace_update_precheck
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

This is starting to get complicated though so I don't have any issues
with just renaming the heap_inplace_update to
heap_inplace_update_and_unlock.

Alternatively since
heap_inplace_update is commented as a subroutine of
systable_inplace_update_finish, should everything just be moved to the
genam.c layer? Although it looks like you already considered and
rejected this approach.

Calling XLogInsert(RM_HEAP_ID) in genam.c would be a worse modularity
violation than the one that led to the changes between v8 and v9. I think
even calling CacheInvalidateHeapTuple() in genam.c would be a worse modularity
violation than the one attributed to v8. Modularity would have the
heap_inplace function resemble heap_update() handling of invals.

Understood. Thanks.

If the above alternatives are not possible, it's probably fine to go
ahead with the current patch with the function renamed to
heap_inplace_update_and_unlock (or something similar) as mentioned
earlier?

I like that name. The next version will use it.

So either we go with this or try the above approach of having a
separate function _validate/_precheck/_validate_tuple_length. I don't
have a strong opinion on either of these approaches.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another

No, not a big advantage. I felt it was more in line with the typical style of
src/backend/executor.

just find it a smell that there are two fields which are not
independent and they go out of sync. And that's why my preference is
to not have a dependent field unless there is a specific advantage.

Got it. This check happens for every tuple of every UPDATE, so performance
may be a factor. Some designs and their merits:

Thanks. If performance is a factor, it makes sense to keep it.

==== a. ri_needLockTagTuple
Performance: best: check one value for nonzero
Drawback: one more value lifecycle to understand
Drawback: users of ResultRelInfo w/o InitResultRelInfo() could miss this

==== b. call IsInplaceUpdateRelation
Performance: worst: two extern function calls, then compare against two values

==== c. make IsInplaceUpdateRelation() and IsInplaceUpdateOid() inline, and call
Performance: high: compare against two values
Drawback: unlike catalog.c peers
Drawback: extensions that call these must recompile if these change

==== d. add IsInplaceUpdateRelationInline() and IsInplaceUpdateOidInline(), and call
Performance: high: compare against two values
Drawback: more symbols to understand
Drawback: extensions might call these, reaching the drawback of (c)

I think my preference order is (a), (c), (d), (b). How do you see it?

My preference order would be the same. In general I like (c) more than
(a) but recompiling extensions sounds like a major drawback so here
the preference is (a).

Can we do (a) along with some extra checks? To elaborate, execMain.c
has a function called CheckValidateRelationRel which is called by
ExecInitModifyTable in nodeModifyTable.c. Can we add the following
assert assert (or just a debug assert) in this function?

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.
An alternative might be to only do debug assertions in the functions
which use the field. But it seems simpler to just do it once in the
ExecInitModifyTable.

But even for the
ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't
ri_needLockTagTuple also be set to a default value of false?

CatalogOpenIndexes() explicitly zero-initializes two fields and relies on
makeNode() zeroing for dozens of others. Hence, omitting the initialization
fits the function's local convention better than including it. (PostgreSQL
has no policy or dominant practice about redundant zero-initialization.)

Thanks. Makes sense.

Thanks & Regards
Nitin Motiani
Google

#63

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#62)

4 attachment(s)

Re: race condition in pg_class

On Wed, Sep 04, 2024 at 09:00:32PM +0530, Nitin Motiani wrote:

How about this alternative then? The tuple length check
and the elog(ERROR) gets its own function. Something like
heap_inplace_update_validate or
heap_inplace_update_validate_tuple_length. So in that case, it would
look like this :

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update_validate/heap_inplace_update_precheck
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

This is starting to get complicated though so I don't have any issues
with just renaming the heap_inplace_update to
heap_inplace_update_and_unlock.

Complexity aside, I don't see the _precheck design qualifying as a modularity
improvement.

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.

v10 does the rename and adds that assertion. This question remains open:

On Thu, Aug 22, 2024 at 12:32:00AM -0700, Noah Misch wrote:

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

How many of those for RELKIND_INDEX vs tables? I'm thinking if we should
always require a tuple lock on indexes, if that would make a difference.

Three sites. See attached inplace125 patch. Is it a net improvement? If so,
I'll squash it into inplace120.

If nobody has an opinion, I'll discard inplace125. I feel it's not a net
improvement, but either way is fine with me.

Attachments:

inplace090-LOCKTAG_TUPLE-eoxact-v10.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.
    
    The current use always releases this locktag.  A planned use will
    continue that intent.  It will involve more areas of code, making unlock
    omissions easier.  Warn under debug_assertions, like we do for various
    resource leaks.  Back-patch to v12 (all supported versions), the plan
    for the commit of the new use.
    
    Reviewed by Heikki Linnakangas.
    
    Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a9..51d52c4 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2255,6 +2255,16 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 				locallock->numLockOwners = 0;
 		}
 
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Tuple locks are currently held only for short durations within a
+		 * transaction. Check that we didn't forget to release one.
+		 */
+		if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+			elog(WARNING, "tuple lock held at commit");
+#endif
+
 		/*
 		 * If the lock or proclock pointers are NULL, this lock was taken via
 		 * the relation fast-path (and is not known to have been transferred).

inplace110-successors-v10.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix data loss at inplace update after heap_update().
    
    As previously-added tests demonstrated, heap_inplace_update() could
    instead update an unrelated tuple of the same catalog.  It could lose
    the update.  Losing relhasindex=t was a source of index corruption.
    Inplace-updating commands like VACUUM will now wait for heap_update()
    commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
    long-running GRANT already hurts VACUUM progress more just by keeping an
    XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
    the uncommitted change.
    
    For implementation details, start at the systable_inplace_update_begin()
    header comment and README.tuplock.  Back-patch to v12 (all supported
    versions).  In back branches, retain a deprecated heap_inplace_update(),
    for extensions.
    
    Reviewed by Heikki Linnakangas, Nitin Motiani and Alexander Lakhin.
    
    Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 6441e8b..ddb2def 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,14 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Reading inplace-updated columns
+-------------------------------
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  Current inplace-updated fields are aligned and are no
+wider than four bytes, and current readers don't need consistency across
+fields.  Hence, they get by with just fetching each field once.  XXX such a
+caller may also read a value that has not reached WAL; see
+systable_inplace_update_finish().
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b2014..0b7dc0a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -63,7 +63,6 @@
 #include "storage/procarray.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
-#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
@@ -6041,61 +6040,167 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 }
 
 /*
- * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
+ * heap_inplace_lock - protect inplace update from concurrent heap_update()
  *
- * Overwriting violates both MVCC and transactional safety, so the uses
- * of this function in Postgres are extremely limited.  Nonetheless we
- * find some places to use it.
+ * Evaluate whether the tuple's state is compatible with a no-key update.
+ * Current transaction rowmarks are fine, as is KEY SHARE from any
+ * transaction.  If compatible, return true with the buffer exclusive-locked,
+ * and the caller must release that by calling
+ * heap_inplace_update_and_unlock(), calling heap_inplace_unlock(), or raising
+ * an error.  Otherwise, return false after blocking transactions, if any,
+ * have ended.
  *
- * The tuple cannot change size, and therefore it's reasonable to assume
- * that its null bitmap (if any) doesn't change either.  So we just
- * overwrite the data portion of the tuple without touching the null
- * bitmap or any of the header fields.
+ * Since this is intended for system catalogs and SERIALIZABLE doesn't cover
+ * DDL, this doesn't guarantee any particular predicate locking.
  *
- * tuple is an in-memory tuple structure containing the data to be written
- * over the target tuple.  Also, tuple->t_self identifies the target tuple.
+ * One could modify this to return true for tuples with delete in progress,
+ * All inplace updaters take a lock that conflicts with DROP.  If explicit
+ * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
+ * update.
  *
- * Note that the tuple updated here had better not come directly from the
- * syscache if the relation has a toast relation as this tuple could
- * include toast values that have been expanded, causing a failure here.
+ * Readers of inplace-updated fields expect changes to those fields are
+ * durable.  For example, vac_truncate_clog() reads datfrozenxid from
+ * pg_database tuples via catalog snapshots.  A future snapshot must not
+ * return a lower datfrozenxid for the same database OID (lower in the
+ * FullTransactionIdPrecedes() sense).  We achieve that since no update of a
+ * tuple can start while we hold a lock on its buffer.  In cases like
+ * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only
+ * to this transaction.  ROLLBACK then is one case where it's okay to lose
+ * inplace updates.  (Restoring relhasindex=false on ROLLBACK is fine, since
+ * any concurrent CREATE INDEX would have blocked, then inplace-updated the
+ * committed tuple.)
+ *
+ * In principle, we could avoid waiting by overwriting every tuple in the
+ * updated tuple chain.  Reader expectations permit updating a tuple only if
+ * it's aborted, is the tail of the chain, or we already updated the tuple
+ * referenced in its t_ctid.  Hence, we would need to overwrite the tuples in
+ * order from tail to head.  That would imply either (a) mutating all tuples
+ * in one critical section or (b) accepting a chance of partial completion.
+ * Partial completion of a relfrozenxid update would have the weird
+ * consequence that the table's next VACUUM could see the table's relfrozenxid
+ * move forward between vacuum_get_cutoffs() and finishing.
+ */
+bool
+heap_inplace_lock(Relation relation,
+				  HeapTuple oldtup_ptr, Buffer buffer)
+{
+	HeapTupleData oldtup = *oldtup_ptr; /* minimize diff vs. heap_update() */
+	TM_Result	result;
+	bool		ret;
+
+	Assert(BufferIsValid(buffer));
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*----------
+	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
+	 *
+	 * - wait unconditionally
+	 * - no tuple locks
+	 * - don't recheck header after wait: simpler to defer to next iteration
+	 * - don't try to continue even if the updater aborts: likewise
+	 * - no crosscheck
+	 */
+	result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false),
+									  buffer);
+
+	if (result == TM_Invisible)
+	{
+		/* no known way this can happen */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg_internal("attempted to overwrite invisible tuple")));
+	}
+	else if (result == TM_SelfModified)
+	{
+		/*
+		 * CREATE INDEX might reach this if an expression is silly enough to
+		 * call e.g. SELECT ... FROM pg_class FOR SHARE.  C code of other SQL
+		 * statements might get here after a heap_update() of the same row, in
+		 * the absence of an intervening CommandCounterIncrement().
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("tuple to be updated was already modified by an operation triggered by the current command")));
+	}
+	else if (result == TM_BeingModified)
+	{
+		TransactionId xwait;
+		uint16		infomask;
+
+		xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+		infomask = oldtup.t_data->t_infomask;
+
+		if (infomask & HEAP_XMAX_IS_MULTI)
+		{
+			LockTupleMode lockmode = LockTupleNoKeyExclusive;
+			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
+			int			remain;
+			bool		current_is_member;
+
+			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
+										lockmode, &current_is_member))
+			{
+				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+				ret = false;
+				MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
+								relation, &oldtup.t_self, XLTW_Update,
+								&remain);
+			}
+			else
+				ret = true;
+		}
+		else if (TransactionIdIsCurrentTransactionId(xwait))
+			ret = true;
+		else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
+			ret = true;
+		else
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			ret = false;
+			XactLockTableWait(xwait, relation, &oldtup.t_self,
+							  XLTW_Update);
+		}
+	}
+	else
+	{
+		ret = (result == TM_Ok);
+		if (!ret)
+		{
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	/*
+	 * GetCatalogSnapshot() relies on invalidation messages to know when to
+	 * take a new snapshot.  COMMIT of xwait is responsible for sending the
+	 * invalidation.  We're not acquiring heavyweight locks sufficient to
+	 * block if not yet sent, so we must take a new snapshot to ensure a later
+	 * attempt has a fair chance.  While we don't need this if xwait aborted,
+	 * don't bother optimizing that.
+	 */
+	if (!ret)
+		InvalidateCatalogSnapshot();
+	return ret;
+}
+
+/*
+ * heap_inplace_update_and_unlock - core of systable_inplace_update_finish
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
  */
 void
-heap_inplace_update(Relation relation, HeapTuple tuple)
+heap_inplace_update_and_unlock(Relation relation,
+							   HeapTuple oldtup, HeapTuple tuple,
+							   Buffer buffer)
 {
-	Buffer		buffer;
-	Page		page;
-	OffsetNumber offnum;
-	ItemId		lp = NULL;
-	HeapTupleHeader htup;
+	HeapTupleHeader htup = oldtup->t_data;
 	uint32		oldlen;
 	uint32		newlen;
 
-	/*
-	 * For now, we don't allow parallel updates.  Unlike a regular update,
-	 * this should never create a combo CID, so it might be possible to relax
-	 * this restriction, but not without more thought and testing.  It's not
-	 * clear that it would be useful, anyway.
-	 */
-	if (IsInParallelMode())
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
-				 errmsg("cannot update tuples during a parallel operation")));
-
-	INJECTION_POINT("inplace-before-pin");
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
-
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
-	if (PageGetMaxOffsetNumber(page) >= offnum)
-		lp = PageGetItemId(page, offnum);
-
-	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
-		elog(ERROR, "invalid lp");
-
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
-
-	oldlen = ItemIdGetLength(lp) - htup->t_hoff;
+	Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self));
+	oldlen = oldtup->t_len - htup->t_hoff;
 	newlen = tuple->t_len - tuple->t_data->t_hoff;
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");
@@ -6107,6 +6212,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 		   (char *) tuple->t_data + tuple->t_data->t_hoff,
 		   newlen);
 
+	/*----------
+	 * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid:
+	 *
+	 * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+	 * ["R" is a VACUUM tbl]
+	 * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+	 * D: systable_getnext() returns pg_class tuple of tbl
+	 * R: memcpy() into pg_class tuple of tbl
+	 * D: raise pg_database.datfrozenxid, XLogInsert(), finish
+	 * [crash]
+	 * [recovery restores datfrozenxid w/o relfrozenxid]
+	 */
+
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
@@ -6127,23 +6245,35 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
 
-		PageSetLSN(page, recptr);
+		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
 	END_CRIT_SECTION();
 
-	UnlockReleaseBuffer(buffer);
+	heap_inplace_unlock(relation, oldtup, buffer);
 
 	/*
 	 * Send out shared cache inval if necessary.  Note that because we only
 	 * pass the new version of the tuple, this mustn't be used for any
 	 * operations that could change catcache lookup keys.  But we aren't
 	 * bothering with index updates either, so that's true a fortiori.
+	 *
+	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
 }
 
+/*
+ * heap_inplace_unlock - reverse of heap_inplace_lock
+ */
+void
+heap_inplace_unlock(Relation relation,
+					HeapTuple oldtup, Buffer buffer)
+{
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+}
+
 #define		FRM_NOOP				0x0001
 #define		FRM_INVALIDATE_XMAX		0x0002
 #define		FRM_RETURN_IS_XID		0x0004
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 43c95d6..964a9a2 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/transam.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
+#include "utils/injection_point.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
@@ -747,3 +749,140 @@ systable_endscan_ordered(SysScanDesc sysscan)
 		UnregisterSnapshot(sysscan->snapshot);
 	pfree(sysscan);
 }
+
+/*
+ * systable_inplace_update_begin --- update a row "in place" (overwrite it)
+ *
+ * Overwriting violates both MVCC and transactional safety, so the uses of
+ * this function in Postgres are extremely limited.  Nonetheless we find some
+ * places to use it.  Standard flow:
+ *
+ * ... [any slow preparation not requiring oldtup] ...
+ * systable_inplace_update_begin([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *	elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *	systable_inplace_update_finish(inplace_state, tup);
+ * else
+ *	systable_inplace_update_cancel(inplace_state);
+ *
+ * The first several params duplicate the systable_beginscan() param list.
+ * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to
+ * find a live tuple.  (In PROC_IN_VACUUM, that is a low-probability transient
+ * condition.)  If "oldtupcopy" gets non-NULL, you must pass output parameter
+ * "state" to systable_inplace_update_finish() or
+ * systable_inplace_update_cancel().
+ */
+void
+systable_inplace_update_begin(Relation relation,
+							  Oid indexId,
+							  bool indexOK,
+							  Snapshot snapshot,
+							  int nkeys, const ScanKeyData *key,
+							  HeapTuple *oldtupcopy,
+							  void **state)
+{
+	ScanKey		mutable_key = palloc(sizeof(ScanKeyData) * nkeys);
+	int			retries = 0;
+	SysScanDesc scan;
+	HeapTuple	oldtup;
+
+	/*
+	 * For now, we don't allow parallel updates.  Unlike a regular update,
+	 * this should never create a combo CID, so it might be possible to relax
+	 * this restriction, but not without more thought and testing.  It's not
+	 * clear that it would be useful, anyway.
+	 */
+	if (IsInParallelMode())
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+				 errmsg("cannot update tuples during a parallel operation")));
+
+	/*
+	 * Accept a snapshot argument, for symmetry, but this function advances
+	 * its snapshot as needed to reach the tail of the updated tuple chain.
+	 */
+	Assert(snapshot == NULL);
+
+	Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation));
+
+	/* Loop for an exclusive-locked buffer of a non-updated tuple. */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+		BufferHeapTupleTableSlot *bslot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Processes issuing heap_update (e.g. GRANT) at maximum speed could
+		 * drive us to this error.  A hostile table owner has stronger ways to
+		 * damage their own table, so that's minor.
+		 */
+		if (retries++ > 10000)
+			elog(ERROR, "giving up after too many tries to overwrite row");
+
+		memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys);
+		INJECTION_POINT("inplace-before-pin");
+		scan = systable_beginscan(relation, indexId, indexOK, snapshot,
+								  nkeys, mutable_key);
+		oldtup = systable_getnext(scan);
+		if (!HeapTupleIsValid(oldtup))
+		{
+			systable_endscan(scan);
+			*oldtupcopy = NULL;
+			return;
+		}
+
+		slot = scan->slot;
+		Assert(TTS_IS_BUFFERTUPLE(slot));
+		bslot = (BufferHeapTupleTableSlot *) slot;
+		if (heap_inplace_lock(scan->heap_rel,
+							  bslot->base.tuple, bslot->buffer))
+			break;
+		systable_endscan(scan);
+	};
+
+	*oldtupcopy = heap_copytuple(oldtup);
+	*state = scan;
+}
+
+/*
+ * systable_inplace_update_finish --- second phase of inplace update
+ *
+ * The tuple cannot change size, and therefore its header fields and null
+ * bitmap (if any) don't change either.
+ */
+void
+systable_inplace_update_finish(void *state, HeapTuple tuple)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	Relation	relation = scan->heap_rel;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+
+	heap_inplace_update_and_unlock(relation, oldtup, tuple, buffer);
+	systable_endscan(scan);
+}
+
+/*
+ * systable_inplace_update_cancel --- abandon inplace update
+ *
+ * This is an alternative to making a no-op update.
+ */
+void
+systable_inplace_update_cancel(void *state)
+{
+	SysScanDesc scan = (SysScanDesc) state;
+	Relation	relation = scan->heap_rel;
+	TupleTableSlot *slot = scan->slot;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HeapTuple	oldtup = bslot->base.tuple;
+	Buffer		buffer = bslot->buffer;
+
+	heap_inplace_unlock(relation, oldtup, buffer);
+	systable_endscan(scan);
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3375905..e4608b9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2785,7 +2785,9 @@ index_update_stats(Relation rel,
 {
 	Oid			relid = RelationGetRelid(rel);
 	Relation	pg_class;
+	ScanKeyData key[1];
 	HeapTuple	tuple;
+	void	   *state;
 	Form_pg_class rd_rel;
 	bool		dirty;
 
@@ -2819,33 +2821,12 @@ index_update_stats(Relation rel,
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	/*
-	 * Make a copy of the tuple to update.  Normally we use the syscache, but
-	 * we can't rely on that during bootstrap or while reindexing pg_class
-	 * itself.
-	 */
-	if (IsBootstrapProcessingMode() ||
-		ReindexIsProcessingHeap(RelationRelationId))
-	{
-		/* don't assume syscache will work */
-		TableScanDesc pg_class_scan;
-		ScanKeyData key[1];
-
-		ScanKeyInit(&key[0],
-					Anum_pg_class_oid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(relid));
-
-		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
-		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
-		tuple = heap_copytuple(tuple);
-		table_endscan(pg_class_scan);
-	}
-	else
-	{
-		/* normal case, use syscache */
-		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
-	}
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	systable_inplace_update_begin(pg_class, ClassOidIndexId, true, NULL,
+								  1, key, &tuple, &state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u", relid);
@@ -2908,11 +2889,12 @@ index_update_stats(Relation rel,
 	 */
 	if (dirty)
 	{
-		heap_inplace_update(pg_class, tuple);
+		systable_inplace_update_finish(state, tuple);
 		/* the above sends a cache inval message */
 	}
 	else
 	{
+		systable_inplace_update_cancel(state);
 		/* no need to change tuple, but force relcache inval anyway */
 		CacheInvalidateRelcacheByTuple(tuple);
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46..ad3082c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -29,6 +29,7 @@
 #include "catalog/toasting.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	 */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
-	if (!HeapTupleIsValid(reltup))
-		elog(ERROR, "cache lookup failed for relation %u", relOid);
-
-	((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
-
 	if (!IsBootstrapProcessingMode())
 	{
 		/* normal case, use a transactional update */
+		reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
 		CatalogTupleUpdate(class_rel, &reltup->t_self, reltup);
 	}
 	else
 	{
 		/* While bootstrapping, we cannot UPDATE, so overwrite in-place */
-		heap_inplace_update(class_rel, reltup);
+
+		ScanKeyData key[1];
+		void	   *state;
+
+		ScanKeyInit(&key[0],
+					Anum_pg_class_oid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(relOid));
+		systable_inplace_update_begin(class_rel, ClassOidIndexId, true,
+									  NULL, 1, key, &reltup, &state);
+		if (!HeapTupleIsValid(reltup))
+			elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+		((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid;
+
+		systable_inplace_update_finish(state, reltup);
 	}
 
 	heap_freetuple(reltup);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 8be435a..40bfd09 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1651,7 +1651,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	Relation	pgdbrel;
 	HeapTuple	tup;
 	ScanKeyData scankey;
-	SysScanDesc scan;
+	void	   *inplace_state;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1790,24 +1790,6 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgstat_drop_database(db_id);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
-	 */
-	ScanKeyInit(&scankey,
-				Anum_pg_database_datname,
-				BTEqualStrategyNumber, F_NAMEEQ,
-				CStringGetDatum(dbname));
-
-	scan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
-							  NULL, 1, &scankey);
-
-	tup = systable_getnext(scan);
-	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for database %u", db_id);
-	datform = (Form_pg_database) GETSTRUCT(tup);
-
-	/*
 	 * Except for the deletion of the catalog row, subsequent actions are not
 	 * transactional (consider DropDatabaseBuffers() discarding modified
 	 * buffers). But we might crash or get interrupted below. To prevent
@@ -1818,8 +1800,17 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * modification is durable before performing irreversible filesystem
 	 * operations.
 	 */
+	ScanKeyInit(&scankey,
+				Anum_pg_database_datname,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(dbname));
+	systable_inplace_update_begin(pgdbrel, DatabaseNameIndexId, true,
+								  NULL, 1, &scankey, &tup, &inplace_state);
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", db_id);
+	datform = (Form_pg_database) GETSTRUCT(tup);
 	datform->datconnlimit = DATCONNLIMIT_INVALID_DB;
-	heap_inplace_update(pgdbrel, tup);
+	systable_inplace_update_finish(inplace_state, tup);
 	XLogFlush(XactLastRecEnd);
 
 	/*
@@ -1827,8 +1818,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 * the row will be gone, but if we fail, dropdb() can be invoked again.
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
-
-	systable_endscan(scan);
+	heap_freetuple(tup);
 
 	/*
 	 * Drop db-specific replication slots.
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 7a5ed6b..55baf10 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -946,25 +946,18 @@ EventTriggerOnLogin(void)
 		{
 			Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
 			HeapTuple	tuple;
+			void	   *state;
 			Form_pg_database db;
 			ScanKeyData key[1];
-			SysScanDesc scan;
 
-			/*
-			 * Get the pg_database tuple to scribble on.  Note that this does
-			 * not directly rely on the syscache to avoid issues with
-			 * flattened toast values for the in-place update.
-			 */
+			/* Fetch a copy of the tuple to scribble on */
 			ScanKeyInit(&key[0],
 						Anum_pg_database_oid,
 						BTEqualStrategyNumber, F_OIDEQ,
 						ObjectIdGetDatum(MyDatabaseId));
 
-			scan = systable_beginscan(pg_db, DatabaseOidIndexId, true,
-									  NULL, 1, key);
-			tuple = systable_getnext(scan);
-			tuple = heap_copytuple(tuple);
-			systable_endscan(scan);
+			systable_inplace_update_begin(pg_db, DatabaseOidIndexId, true,
+										  NULL, 1, key, &tuple, &state);
 
 			if (!HeapTupleIsValid(tuple))
 				elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -980,13 +973,15 @@ EventTriggerOnLogin(void)
 				 * that avoids possible waiting on the row-level lock. Second,
 				 * that avoids dealing with TOAST.
 				 *
-				 * It's known that changes made by heap_inplace_update() may
-				 * be lost due to concurrent normal updates.  However, we are
-				 * OK with that.  The subsequent connections will still have a
-				 * chance to set "dathasloginevt" to false.
+				 * Changes made by inplace update may be lost due to
+				 * concurrent normal updates; see inplace-inval.spec. However,
+				 * we are OK with that.  The subsequent connections will still
+				 * have a chance to set "dathasloginevt" to false.
 				 */
-				heap_inplace_update(pg_db, tuple);
+				systable_inplace_update_finish(state, tuple);
 			}
+			else
+				systable_inplace_update_cancel(state);
 			table_close(pg_db, RowExclusiveLock);
 			heap_freetuple(tuple);
 		}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7d8e9d2..9304b8c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1402,7 +1402,9 @@ vac_update_relstats(Relation relation,
 {
 	Oid			relid = RelationGetRelid(relation);
 	Relation	rd;
+	ScanKeyData key[1];
 	HeapTuple	ctup;
+	void	   *inplace_state;
 	Form_pg_class pgcform;
 	bool		dirty,
 				futurexid,
@@ -1413,7 +1415,12 @@ vac_update_relstats(Relation relation,
 	rd = table_open(RelationRelationId, RowExclusiveLock);
 
 	/* Fetch a copy of the tuple to scribble on */
-	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+	ScanKeyInit(&key[0],
+				Anum_pg_class_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+	systable_inplace_update_begin(rd, ClassOidIndexId, true,
+								  NULL, 1, key, &ctup, &inplace_state);
 	if (!HeapTupleIsValid(ctup))
 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
 			 relid);
@@ -1521,7 +1528,9 @@ vac_update_relstats(Relation relation,
 
 	/* If anything changed, write out the tuple. */
 	if (dirty)
-		heap_inplace_update(rd, ctup);
+		systable_inplace_update_finish(inplace_state, ctup);
+	else
+		systable_inplace_update_cancel(inplace_state);
 
 	table_close(rd, RowExclusiveLock);
 
@@ -1573,6 +1582,7 @@ vac_update_datfrozenxid(void)
 	bool		bogus = false;
 	bool		dirty = false;
 	ScanKeyData key[1];
+	void	   *inplace_state;
 
 	/*
 	 * Restrict this task to one backend per database.  This avoids race
@@ -1696,20 +1706,18 @@ vac_update_datfrozenxid(void)
 	relation = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	/*
-	 * Get the pg_database tuple to scribble on.  Note that this does not
-	 * directly rely on the syscache to avoid issues with flattened toast
-	 * values for the in-place update.
+	 * Fetch a copy of the tuple to scribble on.  We could check the syscache
+	 * tuple first.  If that concluded !dirty, we'd avoid waiting on
+	 * concurrent heap_update() and would avoid exclusive-locking the buffer.
+	 * For now, don't optimize that.
 	 */
 	ScanKeyInit(&key[0],
 				Anum_pg_database_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(MyDatabaseId));
 
-	scan = systable_beginscan(relation, DatabaseOidIndexId, true,
-							  NULL, 1, key);
-	tuple = systable_getnext(scan);
-	tuple = heap_copytuple(tuple);
-	systable_endscan(scan);
+	systable_inplace_update_begin(relation, DatabaseOidIndexId, true,
+								  NULL, 1, key, &tuple, &inplace_state);
 
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
@@ -1743,7 +1751,9 @@ vac_update_datfrozenxid(void)
 		newMinMulti = dbform->datminmxid;
 
 	if (dirty)
-		heap_inplace_update(relation, tuple);
+		systable_inplace_update_finish(inplace_state, tuple);
+	else
+		systable_inplace_update_cancel(inplace_state);
 
 	heap_freetuple(tuple);
 	table_close(relation, RowExclusiveLock);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index fdcfbe8..c25f5d1 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -233,5 +233,14 @@ extern SysScanDesc systable_beginscan_ordered(Relation heapRelation,
 extern HeapTuple systable_getnext_ordered(SysScanDesc sysscan,
 										  ScanDirection direction);
 extern void systable_endscan_ordered(SysScanDesc sysscan);
+extern void systable_inplace_update_begin(Relation relation,
+										  Oid indexId,
+										  bool indexOK,
+										  Snapshot snapshot,
+										  int nkeys, const ScanKeyData *key,
+										  HeapTuple *oldtupcopy,
+										  void **state);
+extern void systable_inplace_update_finish(void *state, HeapTuple tuple);
+extern void systable_inplace_update_cancel(void *state);
 
 #endif							/* GENAM_H */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec8..0970941 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -336,7 +336,13 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 bool follow_updates,
 								 Buffer *buffer, struct TM_FailureData *tmfd);
 
-extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+extern bool heap_inplace_lock(Relation relation,
+							  HeapTuple oldtup_ptr, Buffer buffer);
+extern void heap_inplace_update_and_unlock(Relation relation,
+										   HeapTuple oldtup, HeapTuple tuple,
+										   Buffer buffer);
+extern void heap_inplace_unlock(Relation relation,
+								HeapTuple oldtup, Buffer buffer);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out
index 432ece5..a91402c 100644
--- a/src/test/isolation/expected/intra-grant-inplace-db.out
+++ b/src/test/isolation/expected/intra-grant-inplace-db.out
@@ -9,20 +9,20 @@ step b1: BEGIN;
 step grant1: 
 	GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
 
-step vac2: VACUUM (FREEZE);
+step vac2: VACUUM (FREEZE); <waiting ...>
 step snap3: 
 	INSERT INTO frozen_witness
 	SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog;
 
 step c1: COMMIT;
+step vac2: <... completed>
 step cmp3: 
 	SELECT 'datfrozenxid retreated'
 	FROM pg_database
 	WHERE datname = current_catalog
 		AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);
 
-?column?              
-----------------------
-datfrozenxid retreated
-(1 row)
+?column?
+--------
+(0 rows)
 
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index cc1e47a..fe26984 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -14,15 +14,16 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
@@ -58,8 +59,9 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
+step addk2: <... completed>
 
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
@@ -98,7 +100,7 @@ f
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
 step c2: COMMIT;
 
-starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
+starting permutation: b3 sfu3 b1 grant1 read2 as3 addk2 r3 c1 read2
 step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
 step sfu3: 
 	SELECT relhasindex FROM pg_class
@@ -122,17 +124,19 @@ relhasindex
 f          
 (1 row)
 
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+step as3: LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
 step c1: COMMIT;
+step addk2: <... completed>
 step read2: 
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
 
 relhasindex
 -----------
-f          
+t          
 (1 row)
 
 
diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec
index bbecd5d..9de40ec 100644
--- a/src/test/isolation/specs/intra-grant-inplace-db.spec
+++ b/src/test/isolation/specs/intra-grant-inplace-db.spec
@@ -42,5 +42,4 @@ step cmp3	{
 }
 
 
-# XXX extant bug
 permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 3cd696b..d07ed3b 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -48,6 +48,7 @@ step sfu3	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
 }
+step as3	{ LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE; }
 step r3	{ ROLLBACK; }
 
 # Additional heap_update()
@@ -73,7 +74,7 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned post-bugfix behavior
+# XXX extant bugs: permutation comments refer to planned future LockTuple()
 
 permutation
 	b1
@@ -117,6 +118,7 @@ permutation
 	b1
 	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
 	read2
+	as3			# XXX temporary until patch adds locking to addk2
 	addk2(c1)	# block in LockTuple() behind grant1
 	r3			# unblock grant1; addk2 now awaits grant1 xmax
 	c1
diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out
index 123f45a..db7dab6 100644
--- a/src/test/modules/injection_points/expected/inplace.out
+++ b/src/test/modules/injection_points/expected/inplace.out
@@ -40,4 +40,301 @@ step read1:
 	SELECT reltuples = -1 AS reltuples_unknown
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 
-ERROR:  could not create unique index "pg_class_oid_index"
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step c2: COMMIT;
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step r2: ROLLBACK;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
+
+starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1
+mkrels
+------
+      
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step begin2: BEGIN;
+step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC;
+step vac1: VACUUM vactest.orig50;  -- wait during inplace update <waiting ...>
+step c2: COMMIT;
+step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC;
+step vac3: VACUUM pg_class;
+step mkrels3: 
+	SELECT vactest.mkrels('intruder', 1, 100);  -- repopulate LP_UNUSED
+	SELECT injection_points_detach('inplace-before-pin');
+	SELECT injection_points_wakeup('inplace-before-pin');
+
+mkrels
+------
+      
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step vac1: <... completed>
+step read1: 
+	REINDEX TABLE pg_class;  -- look for duplicates
+	SELECT reltuples = -1 AS reltuples_unknown
+	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
+
+reltuples_unknown
+-----------------
+f                
+(1 row)
+
diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec
index e957713..86539a5 100644
--- a/src/test/modules/injection_points/specs/inplace.spec
+++ b/src/test/modules/injection_points/specs/inplace.spec
@@ -32,12 +32,9 @@ setup
 	CREATE TABLE vactest.orig50 ();
 	SELECT vactest.mkrels('orig', 51, 100);
 }
-
-# XXX DROP causes an assertion failure; adopt DROP once fixed
 teardown
 {
-	--DROP SCHEMA vactest CASCADE;
-	DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$;
+	DROP SCHEMA vactest CASCADE;
 	DROP EXTENSION injection_points;
 }
 
@@ -56,11 +53,13 @@ step read1	{
 	FROM pg_class WHERE oid = 'vactest.orig50'::regclass;
 }
 
-
 # Transactional updates of the tuple vac1 is waiting to inplace-update.
 session s2
 step grant2		{ GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; }
-
+step revoke2	{ REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; }
+step begin2		{ BEGIN; }
+step c2			{ COMMIT; }
+step r2			{ ROLLBACK; }
 
 # Non-blocking actions.
 session s3
@@ -74,10 +73,69 @@ step mkrels3	{
 }
 
 
-# XXX extant bug
+# target gains a successor at the last moment
 permutation
 	vac1(mkrels3)	# reads pg_class tuple T0 for vactest.orig50, xmax invalid
 	grant2			# T0 becomes eligible for pruning, T1 is successor
 	vac3			# T0 becomes LP_UNUSED
-	mkrels3			# T0 reused; vac1 wakes and overwrites the reused T0
+	mkrels3			# vac1 wakes, scans to T1
 	read1
+
+# target already has a successor, which commits
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	c2				# T0 becomes eligible for pruning
+	vac3			# T0 becomes LP_UNUSED
+	mkrels3			# vac1 wakes, scans to T1
+	read1
+
+# target already has a successor, which becomes LP_UNUSED at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1
+	vac1(mkrels3)	# reads T0 for vactest.orig50
+	r2				# T1 becomes eligible for pruning
+	vac3			# T1 becomes LP_UNUSED
+	mkrels3			# reuse T1; vac1 scans to T0
+	read1
+
+# target already has a successor, which becomes LP_REDIRECT at the last moment
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	c2
+	revoke2			# HOT update to T2
+	grant2			# HOT update to T3
+	vac3			# T1 becomes LP_REDIRECT
+	mkrels3			# reuse T2; vac1 scans to T3
+	read1
+
+# waiting for updater to end
+permutation
+	vac1(c2)		# reads pg_class tuple T0 for vactest.orig50, xmax invalid
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	revoke2			# HOT update to T2
+	mkrels3			# vac1 awakes briefly, then waits for s2
+	c2
+	read1
+
+# Another LP_UNUSED.  This time, do change the live tuple.  Final live tuple
+# body is identical to original, at a different TID.
+permutation
+	begin2
+	grant2			# T0.t_ctid = T1, non-HOT due to filled page
+	vac1(mkrels3)	# reads T0
+	r2				# T1 becomes eligible for pruning
+	grant2			# T0.t_ctid = T2; T0 becomes eligible for pruning
+	revoke2			# T2.t_ctid = T3; T2 becomes eligible for pruning
+	vac3			# T0, T1 & T2 become LP_UNUSED
+	mkrels3			# reuse T0, T1 & T2; vac1 scans to T3
+	read1
+
+# Another LP_REDIRECT.  Compared to the earlier test, omit the last grant2.
+# Hence, final live tuple body is identical to original, at a different TID.
+permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1

inplace120-locktag-v10.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    For inplace update durability, make heap_update() callers wait.
    
    The previous commit fixed some ways of losing an inplace update.  It
    remained possible to lose one when a backend working toward a
    heap_update() copied a tuple into memory just before inplace update of
    that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
    to govern admission to the steps of copying an old tuple, modifying it,
    and issuing heap_update().  This includes UPDATE and MERGE commands.  To
    avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE
    when holding a relation lock sufficient to exclude inplace updaters.
    Back-patch to v12 (all supported versions).
    
    Reviewed by Nitin Motiani and FIXME.
    
    Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index ddb2def..818cd7f 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -154,6 +154,48 @@ The following infomask bits are applicable:
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
 
+Locking to write inplace-updated tables
+---------------------------------------
+
+If IsInplaceUpdateRelation() returns true for a table, the table is a system
+catalog that receives systable_inplace_update_begin() calls.  Preparing a
+heap_update() of these tables follows additional locking rules, to ensure we
+don't lose the effects of an inplace update.  In particular, consider a moment
+when a backend has fetched the old tuple to modify, not yet having called
+heap_update().  Another backend's inplace update starting then can't conclude
+until the heap_update() places its new tuple in a buffer.  We enforce that
+using locktags as follows.  While DDL code is the main audience, the executor
+follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
+are per-catalog:
+
+  pg_class systable_inplace_update_begin() callers: before the call, acquire a
+  lock on the relation in mode ShareUpdateExclusiveLock or stricter.  If the
+  update targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX),
+  that lock must be on the table.  Locking the index rel is not necessary.
+  (This allows VACUUM to overwrite per-index pg_class while holding a lock on
+  the table alone.) systable_inplace_update_begin() acquires and releases
+  LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each
+  tuple it overwrites.
+
+  pg_class heap_update() callers: before copying the tuple to modify, take a
+  lock on the tuple, a ShareUpdateExclusiveLock on the relation, or a
+  ShareRowExclusiveLock or stricter on the relation.
+
+  SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
+  Most heap_update() callers already hold a suitable lock on the relation for
+  other reasons and can skip the tuple lock.  If you do acquire the tuple
+  lock, release it immediately after the update.
+
+
+  pg_database: before copying the tuple to modify, all updaters of pg_database
+  rows acquire LOCKTAG_TUPLE.  (Few updaters acquire LOCKTAG_OBJECT on the
+  database OID, so it wasn't worth extending that as a second option.)
+
+Ideally, DDL might want to perform permissions checks before LockTuple(), as
+we do with RangeVarGetRelidExtended() callbacks.  We typically don't bother.
+LOCKTAG_TUPLE acquirers release it after each row, so the potential
+inconvenience is lower.
+
 Reading inplace-updated columns
 -------------------------------
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0b7dc0a..4d75f7a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,8 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -75,6 +77,12 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
 								  bool all_visible_cleared, bool new_all_visible_cleared);
+#ifdef USE_ASSERT_CHECKING
+static void check_lock_if_inplace_updateable_rel(Relation relation,
+												 ItemPointer otid,
+												 HeapTuple newtup);
+static void check_inplace_rel_lock(HeapTuple oldtup);
+#endif
 static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 										   Bitmapset *interesting_cols,
 										   Bitmapset *external_cols,
@@ -121,6 +129,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
  * heavyweight lock mode and MultiXactStatus values to use for any particular
  * tuple lock strength.
  *
+ * These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
+ *
  * Don't look at lockstatus/updstatus directly!  Use get_mxact_status_for_lock
  * instead.
  */
@@ -3207,6 +3217,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 				 errmsg("cannot update tuples during a parallel operation")));
 
+#ifdef USE_ASSERT_CHECKING
+	check_lock_if_inplace_updateable_rel(relation, otid, newtup);
+#endif
+
 	/*
 	 * Fetch the list of attributes to be checked for various operations.
 	 *
@@ -4071,6 +4085,128 @@ l2:
 	return TM_Ok;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Confirm adequate lock held during heap_update(), per rules from
+ * README.tuplock section "Locking to write inplace-updated tables".
+ */
+static void
+check_lock_if_inplace_updateable_rel(Relation relation,
+									 ItemPointer otid,
+									 HeapTuple newtup)
+{
+	/* LOCKTAG_TUPLE acceptable for any catalog */
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+		case DatabaseRelationId:
+			{
+				LOCKTAG		tuptag;
+
+				SET_LOCKTAG_TUPLE(tuptag,
+								  relation->rd_lockInfo.lockRelId.dbId,
+								  relation->rd_lockInfo.lockRelId.relId,
+								  ItemPointerGetBlockNumber(otid),
+								  ItemPointerGetOffsetNumber(otid));
+				if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
+					return;
+			}
+			break;
+		default:
+			Assert(!IsInplaceUpdateRelation(relation));
+			return;
+	}
+
+	switch (RelationGetRelid(relation))
+	{
+		case RelationRelationId:
+			{
+				/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
+				Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
+				Oid			relid = classForm->oid;
+				Oid			dbid;
+				LOCKTAG		tag;
+
+				if (IsSharedRelation(relid))
+					dbid = InvalidOid;
+				else
+					dbid = MyDatabaseId;
+
+				if (classForm->relkind == RELKIND_INDEX)
+				{
+					Relation	irel = index_open(relid, AccessShareLock);
+
+					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+					index_close(irel, AccessShareLock);
+				}
+				else
+					SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+					elog(WARNING,
+						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+						 NameStr(classForm->relname),
+						 relid,
+						 classForm->relkind,
+						 ItemPointerGetBlockNumber(otid),
+						 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+		case DatabaseRelationId:
+			{
+				/* LOCKTAG_TUPLE required */
+				Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
+
+				elog(WARNING,
+					 "missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
+					 NameStr(dbForm->datname),
+					 dbForm->oid,
+					 ItemPointerGetBlockNumber(otid),
+					 ItemPointerGetOffsetNumber(otid));
+			}
+			break;
+	}
+}
+
+/*
+ * Confirm adequate relation lock held, per rules from README.tuplock section
+ * "Locking to write inplace-updated tables".
+ */
+static void
+check_inplace_rel_lock(HeapTuple oldtup)
+{
+	Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
+	Oid			relid = classForm->oid;
+	Oid			dbid;
+	LOCKTAG		tag;
+
+	if (IsSharedRelation(relid))
+		dbid = InvalidOid;
+	else
+		dbid = MyDatabaseId;
+
+	if (classForm->relkind == RELKIND_INDEX)
+	{
+		Relation	irel = index_open(relid, AccessShareLock);
+
+		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
+		index_close(irel, AccessShareLock);
+	}
+	else
+		SET_LOCKTAG_RELATION(tag, dbid, relid);
+
+	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
+		elog(WARNING,
+			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
+			 NameStr(classForm->relname),
+			 relid,
+			 classForm->relkind,
+			 ItemPointerGetBlockNumber(&oldtup->t_self),
+			 ItemPointerGetOffsetNumber(&oldtup->t_self));
+}
+#endif
+
 /*
  * Check if the specified attribute's values are the same.  Subroutine for
  * HeapDetermineColumnsInfo.
@@ -6088,15 +6224,21 @@ heap_inplace_lock(Relation relation,
 	TM_Result	result;
 	bool		ret;
 
+#ifdef USE_ASSERT_CHECKING
+	if (RelationGetRelid(relation) == RelationRelationId)
+		check_inplace_rel_lock(oldtup_ptr);
+#endif
+
 	Assert(BufferIsValid(buffer));
 
+	LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/*----------
 	 * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
 	 *
 	 * - wait unconditionally
-	 * - no tuple locks
+	 * - already locked tuple above, since inplace needs that unconditionally
 	 * - don't recheck header after wait: simpler to defer to next iteration
 	 * - don't try to continue even if the updater aborts: likewise
 	 * - no crosscheck
@@ -6180,7 +6322,10 @@ heap_inplace_lock(Relation relation,
 	 * don't bother optimizing that.
 	 */
 	if (!ret)
+	{
+		UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
 		InvalidateCatalogSnapshot();
+	}
 	return ret;
 }
 
@@ -6189,6 +6334,8 @@ heap_inplace_lock(Relation relation,
  *
  * The tuple cannot change size, and therefore its header fields and null
  * bitmap (if any) don't change either.
+ *
+ * Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
  */
 void
 heap_inplace_update_and_unlock(Relation relation,
@@ -6272,6 +6419,7 @@ heap_inplace_unlock(Relation relation,
 					HeapTuple oldtup, Buffer buffer)
 {
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+	UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
 }
 
 #define		FRM_NOOP				0x0001
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 964a9a2..83d5717 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -755,7 +755,9 @@ systable_endscan_ordered(SysScanDesc sysscan)
  *
  * Overwriting violates both MVCC and transactional safety, so the uses of
  * this function in Postgres are extremely limited.  Nonetheless we find some
- * places to use it.  Standard flow:
+ * places to use it.  See README.tuplock section "Locking to write
+ * inplace-updated tables" and later sections for expectations of readers and
+ * writers of a table that gets inplace updates.  Standard flow:
  *
  * ... [any slow preparation not requiring oldtup] ...
  * systable_inplace_update_begin([...], &tup, &inplace_state);
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index a44ccee..bc0e259 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -75,6 +75,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/aclchk_internal.h"
 #include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 		HeapTuple	tuple;
 		ListCell   *cell_colprivs;
 
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+		tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relOid);
 		pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
 										 values, nulls, replaces);
 
 			CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 			/* Update initial privileges for extensions */
 			recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 			pfree(new_acl);
 		}
+		else
+			UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/*
 		 * Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 		Oid		   *oldmembers;
 		Oid		   *newmembers;
 
-		tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+		tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
 
@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
 									 nulls, replaces);
 
 		CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
+		UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
 		/* Update initial privileges for extensions */
 		recordExtensionInitPriv(objectid, classid, 0, new_acl);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6c39434..8aefbcd 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
 /*
  * IsInplaceUpdateRelation
  *		True iff core code performs inplace updates on the relation.
+ *
+ *		This is used for assertions and for making the executor follow the
+ *		locking protocol described at README.tuplock section "Locking to write
+ *		inplace-updated tables".  Extensions may inplace-update other heap
+ *		tables, but concurrent SQL UPDATE on the same table may overwrite
+ *		those modifications.
+ *
+ *		The executor can assume these are not partitions or partitioned and
+ *		have no triggers.
  */
 bool
 IsInplaceUpdateRelation(Relation relation)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 40bfd09..aa91a39 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1877,6 +1877,7 @@ RenameDatabase(const char *oldname, const char *newname)
 {
 	Oid			db_id;
 	HeapTuple	newtup;
+	ItemPointerData otid;
 	Relation	rel;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1948,11 +1949,13 @@ RenameDatabase(const char *oldname, const char *newname)
 				 errdetail_busy_db(notherbackends, npreparedxacts)));
 
 	/* rename */
-	newtup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	newtup = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
 	if (!HeapTupleIsValid(newtup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
+	otid = newtup->t_self;
 	namestrcpy(&(((Form_pg_database) GETSTRUCT(newtup))->datname), newname);
-	CatalogTupleUpdate(rel, &newtup->t_self, newtup);
+	CatalogTupleUpdate(rel, &otid, newtup);
+	UnlockTuple(rel, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2201,6 +2204,7 @@ movedb(const char *dbname, const char *tblspcname)
 			ereport(ERROR,
 					(errcode(ERRCODE_UNDEFINED_DATABASE),
 					 errmsg("database \"%s\" does not exist", dbname)));
+		LockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_tblspcoid);
 		new_record_repl[Anum_pg_database_dattablespace - 1] = true;
@@ -2209,6 +2213,7 @@ movedb(const char *dbname, const char *tblspcname)
 									 new_record,
 									 new_record_nulls, new_record_repl);
 		CatalogTupleUpdate(pgdbrel, &oldtuple->t_self, newtuple);
+		UnlockTuple(pgdbrel, &oldtuple->t_self, InplaceUpdateTupleLock);
 
 		InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2439,6 +2444,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", stmt->dbname)));
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datform = (Form_pg_database) GETSTRUCT(tuple);
 	dboid = datform->oid;
@@ -2488,6 +2494,7 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel)
 	newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), new_record,
 								 new_record_nulls, new_record_repl);
 	CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, dboid, 0);
 
@@ -2537,6 +2544,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	if (!object_ownercheck(DatabaseRelationId, db_id, GetUserId()))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 					   stmt->dbname);
+	LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
@@ -2565,6 +2573,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
+		HeapTuple	newtuple;
 
 		ereport(NOTICE,
 				(errmsg("changing version from %s to %s",
@@ -2573,14 +2582,15 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
-		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
-								  values, nulls, replaces);
-		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
-		heap_freetuple(tuple);
+		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+									 values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, newtuple);
+		heap_freetuple(newtuple);
 	}
 	else
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
+	UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(DatabaseRelationId, db_id, 0);
 
@@ -2692,6 +2702,8 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied to change owner of database")));
 
+		LockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
+
 		repl_repl[Anum_pg_database_datdba - 1] = true;
 		repl_val[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(newOwnerId);
 
@@ -2713,6 +2725,7 @@ AlterDatabaseOwner(const char *dbname, Oid newOwnerId)
 
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel), repl_val, repl_null, repl_repl);
 		CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
+		UnlockTuple(rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 55baf10..05a6de6 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -388,6 +388,7 @@ SetDatabaseHasLoginEventTriggers(void)
 	/* Set dathasloginevt flag in pg_database */
 	Form_pg_database db;
 	Relation	pg_db = table_open(DatabaseRelationId, RowExclusiveLock);
+	ItemPointerData otid;
 	HeapTuple	tuple;
 
 	/*
@@ -399,16 +400,18 @@ SetDatabaseHasLoginEventTriggers(void)
 	 */
 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	tuple = SearchSysCacheLockedCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	otid = tuple->t_self;
 	db = (Form_pg_database) GETSTRUCT(tuple);
 	if (!db->dathasloginevt)
 	{
 		db->dathasloginevt = true;
-		CatalogTupleUpdate(pg_db, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_db, &otid, tuple);
 		CommandCounterIncrement();
 	}
+	UnlockTuple(pg_db, &otid, InplaceUpdateTupleLock);
 	table_close(pg_db, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c5a56c7..6b22a88 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4413,14 +4413,17 @@ update_relispartition(Oid relationId, bool newval)
 {
 	HeapTuple	tup;
 	Relation	classRel;
+	ItemPointerData otid;
 
 	classRel = table_open(RelationRelationId, RowExclusiveLock);
-	tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
+	tup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relationId));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relationId);
+	otid = tup->t_self;
 	Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
 	((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
-	CatalogTupleUpdate(classRel, &tup->t_self, tup);
+	CatalogTupleUpdate(classRel, &otid, tup);
+	UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b3cc6f8..7870e93 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3586,6 +3586,7 @@ SetRelationTableSpace(Relation rel,
 {
 	Relation	pg_class;
 	HeapTuple	tuple;
+	ItemPointerData otid;
 	Form_pg_class rd_rel;
 	Oid			reloid = RelationGetRelid(rel);
 
@@ -3594,9 +3595,10 @@ SetRelationTableSpace(Relation rel,
 	/* Get a modifiable copy of the relation's pg_class row. */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	tuple = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(reloid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	otid = tuple->t_self;
 	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 
 	/* Update the pg_class row. */
@@ -3604,7 +3606,8 @@ SetRelationTableSpace(Relation rel,
 		InvalidOid : newTableSpaceId;
 	if (RelFileNumberIsValid(newRelFilenumber))
 		rd_rel->relfilenode = newRelFilenumber;
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+	CatalogTupleUpdate(pg_class, &otid, tuple);
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 
 	/*
 	 * Record dependency on tablespace.  This is only required for relations
@@ -4098,6 +4101,7 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 {
 	Relation	targetrelation;
 	Relation	relrelation;	/* for RELATION relation */
+	ItemPointerData otid;
 	HeapTuple	reltup;
 	Form_pg_class relform;
 	Oid			namespaceId;
@@ -4120,7 +4124,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	relrelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	reltup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	otid = reltup->t_self;
 	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relform = (Form_pg_class) GETSTRUCT(reltup);
@@ -4147,7 +4152,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
-	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+	CatalogTupleUpdate(relrelation, &otid, reltup);
+	UnlockTuple(relrelation, &otid, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
 								 InvalidOid, is_internal);
@@ -14875,7 +14881,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 
 	/* Fetch heap tuple */
 	relid = RelationGetRelid(rel);
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 
@@ -14979,6 +14985,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 								 repl_val, repl_null, repl_repl);
 
 	CatalogTupleUpdate(pgclass, &newtuple->t_self, newtuple);
+	UnlockTuple(pgclass, &tuple->t_self, InplaceUpdateTupleLock);
 
 	InvokeObjectPostAlterHook(RelationRelationId, RelationGetRelid(rel), 0);
 
@@ -17131,7 +17138,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	ObjectAddress thisobj;
 	bool		already_done = false;
 
-	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	/* no rel lock for relkind=c so use LOCKTAG_TUPLE */
+	classTup = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(classTup))
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
@@ -17150,6 +17158,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 	already_done = object_address_present(&thisobj, objsMoved);
 	if (!already_done && oldNspOid != newNspOid)
 	{
+		ItemPointerData otid = classTup->t_self;
+
 		/* check for duplicate name (more friendly than unique-index failure) */
 		if (get_relname_relid(NameStr(classForm->relname),
 							  newNspOid) != InvalidOid)
@@ -17162,7 +17172,9 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 		/* classTup is a copy, so OK to scribble on */
 		classForm->relnamespace = newNspOid;
 
-		CatalogTupleUpdate(classRel, &classTup->t_self, classTup);
+		CatalogTupleUpdate(classRel, &otid, classTup);
+		UnlockTuple(classRel, &otid, InplaceUpdateTupleLock);
+
 
 		/* Update dependency on schema if caller said so */
 		if (hasDependEntry &&
@@ -17174,6 +17186,8 @@ AlterRelationNamespaceInternal(Relation classRel, Oid relOid,
 			elog(ERROR, "could not change schema dependency for relation \"%s\"",
 				 NameStr(classForm->relname));
 	}
+	else
+		UnlockTuple(classRel, &classTup->t_self, InplaceUpdateTupleLock);
 	if (!already_done)
 	{
 		add_exact_object_address(&thisobj, objsMoved);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186f..f04cd0b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1024,6 +1024,10 @@ CheckValidResultRel(ResultRelInfo *resultRelInfo, CmdType operation,
 	Relation	resultRel = resultRelInfo->ri_RelationDesc;
 	FdwRoutine *fdwroutine;
 
+	/* Expect a fully-formed ResultRelInfo from InitResultRelInfo(). */
+	Assert(resultRelInfo->ri_needLockTagTuple ==
+		   IsInplaceUpdateRelation(resultRel));
+
 	switch (resultRel->rd_rel->relkind)
 	{
 		case RELKIND_RELATION:
@@ -1204,6 +1208,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_NumIndices = 0;
 	resultRelInfo->ri_IndexRelationDescs = NULL;
 	resultRelInfo->ri_IndexRelationInfo = NULL;
+	resultRelInfo->ri_needLockTagTuple =
+		IsInplaceUpdateRelation(resultRelationDesc);
 	/* make a copy so as not to depend on relcache info not changing... */
 	resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
 	if (resultRelInfo->ri_TrigDesc)
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 1086cbc..54025c9 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -661,8 +661,12 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
-	/* For now we support only tables. */
+	/*
+	 * We support only non-system tables, with
+	 * check_publication_add_relation() accountable.
+	 */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+	Assert(!IsCatalogRelation(rel));
 
 	CheckCmdReplicaIdentity(rel, CMD_UPDATE);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80..1161520 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2324,6 +2324,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	}
 	else
 	{
+		ItemPointerData lockedtid;
+
 		/*
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here to try again.  (We don't need to redo triggers,
@@ -2332,6 +2334,7 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 		 * to do them again.)
 		 */
 redo_act:
+		lockedtid = *tupleid;
 		result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
 							   canSetTag, &updateCxt);
 
@@ -2425,6 +2428,14 @@ redo_act:
 								ExecInitUpdateProjection(context->mtstate,
 														 resultRelInfo);
 
+							if (resultRelInfo->ri_needLockTagTuple)
+							{
+								UnlockTuple(resultRelationDesc,
+											&lockedtid, InplaceUpdateTupleLock);
+								LockTuple(resultRelationDesc,
+										  tupleid, InplaceUpdateTupleLock);
+							}
+
 							/* Fetch the most recent version of old tuple. */
 							oldSlot = resultRelInfo->ri_oldTupleSlot;
 							if (!table_tuple_fetch_row_version(resultRelationDesc,
@@ -2529,6 +2540,14 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	TransactionId xmin;
 	bool		isnull;
 
+	/*
+	 * Parse analysis should have blocked ON CONFLICT for all system
+	 * relations, which includes these.  There's no fundamental obstacle to
+	 * supporting this; we'd just need to handle LOCKTAG_TUPLE like the other
+	 * ExecUpdate() caller.
+	 */
+	Assert(!resultRelInfo->ri_needLockTagTuple);
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(context->estate, resultRelInfo);
 
@@ -2854,6 +2873,7 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	ModifyTableState *mtstate = context->mtstate;
 	List	  **mergeActions = resultRelInfo->ri_MergeActions;
+	ItemPointerData lockedtid;
 	List	   *actionStates;
 	TupleTableSlot *newslot = NULL;
 	TupleTableSlot *rslot = NULL;
@@ -2890,14 +2910,32 @@ ExecMergeMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	 * target wholerow junk attr.
 	 */
 	Assert(tupleid != NULL || oldtuple != NULL);
+	ItemPointerSetInvalid(&lockedtid);
 	if (oldtuple != NULL)
+	{
+		Assert(!resultRelInfo->ri_needLockTagTuple);
 		ExecForceStoreHeapTuple(oldtuple, resultRelInfo->ri_oldTupleSlot,
 								false);
-	else if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
-											tupleid,
-											SnapshotAny,
-											resultRelInfo->ri_oldTupleSlot))
-		elog(ERROR, "failed to fetch the target tuple");
+	}
+	else
+	{
+		if (resultRelInfo->ri_needLockTagTuple)
+		{
+			/*
+			 * This locks even for CMD_DELETE, for CMD_NOTHING, and for tuples
+			 * that don't match mas_whenqual.  MERGE on system catalogs is a
+			 * minor use case, so don't bother optimizing those.
+			 */
+			LockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+					  InplaceUpdateTupleLock);
+			lockedtid = *tupleid;
+		}
+		if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+										   tupleid,
+										   SnapshotAny,
+										   resultRelInfo->ri_oldTupleSlot))
+			elog(ERROR, "failed to fetch the target tuple");
+	}
 
 	/*
 	 * Test the join condition.  If it's satisfied, perform a MATCHED action.
@@ -2969,7 +3007,7 @@ lmerge_matched:
 										tupleid, NULL, newslot, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -2980,11 +3018,11 @@ lmerge_matched:
 				{
 					if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 											  oldtuple, newslot))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecUpdateAct(context, resultRelInfo, tupleid,
@@ -3003,7 +3041,8 @@ lmerge_matched:
 					if (updateCxt.crossPartUpdate)
 					{
 						mtstate->mt_merge_updated += 1;
-						return context->cpUpdateReturningSlot;
+						rslot = context->cpUpdateReturningSlot;
+						goto out;
 					}
 				}
 
@@ -3021,7 +3060,7 @@ lmerge_matched:
 										NULL, NULL, &result))
 				{
 					if (result == TM_Ok)
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 
 					break;		/* concurrent update/delete */
 				}
@@ -3032,11 +3071,11 @@ lmerge_matched:
 				{
 					if (!ExecIRDeleteTriggers(estate, resultRelInfo,
 											  oldtuple))
-						return NULL;	/* "do nothing" */
+						goto out;	/* "do nothing" */
 				}
 				else
 				{
-					/* called table_tuple_fetch_row_version() above */
+					/* checked ri_needLockTagTuple above */
 					Assert(oldtuple == NULL);
 
 					result = ExecDeleteAct(context, resultRelInfo, tupleid,
@@ -3118,7 +3157,7 @@ lmerge_matched:
 				 * let caller handle it under NOT MATCHED [BY TARGET] clauses.
 				 */
 				*matched = false;
-				return NULL;
+				goto out;
 
 			case TM_Updated:
 				{
@@ -3192,7 +3231,7 @@ lmerge_matched:
 								 * more to do.
 								 */
 								if (TupIsNull(epqslot))
-									return NULL;
+									goto out;
 
 								/*
 								 * If we got a NULL ctid from the subplan, the
@@ -3210,6 +3249,15 @@ lmerge_matched:
 								 * we need to switch to the NOT MATCHED BY
 								 * SOURCE case.
 								 */
+								if (resultRelInfo->ri_needLockTagTuple)
+								{
+									if (ItemPointerIsValid(&lockedtid))
+										UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+													InplaceUpdateTupleLock);
+									LockTuple(resultRelInfo->ri_RelationDesc, &context->tmfd.ctid,
+											  InplaceUpdateTupleLock);
+									lockedtid = context->tmfd.ctid;
+								}
 								if (!table_tuple_fetch_row_version(resultRelationDesc,
 																   &context->tmfd.ctid,
 																   SnapshotAny,
@@ -3238,7 +3286,7 @@ lmerge_matched:
 							 * MATCHED [BY TARGET] actions
 							 */
 							*matched = false;
-							return NULL;
+							goto out;
 
 						case TM_SelfModified:
 
@@ -3266,13 +3314,13 @@ lmerge_matched:
 
 							/* This shouldn't happen */
 							elog(ERROR, "attempted to update or delete invisible tuple");
-							return NULL;
+							goto out;
 
 						default:
 							/* see table_tuple_lock call in ExecDelete() */
 							elog(ERROR, "unexpected table_tuple_lock status: %u",
 								 result);
-							return NULL;
+							goto out;
 					}
 				}
 
@@ -3319,6 +3367,10 @@ lmerge_matched:
 	/*
 	 * Successfully executed an action or no qualifying action was found.
 	 */
+out:
+	if (ItemPointerIsValid(&lockedtid))
+		UnlockTuple(resultRelInfo->ri_RelationDesc, &lockedtid,
+					InplaceUpdateTupleLock);
 	return rslot;
 }
 
@@ -3770,6 +3822,7 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		tuplock;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -4082,6 +4135,8 @@ ExecModifyTable(PlanState *pstate)
 				break;
 
 			case CMD_UPDATE:
+				tuplock = false;
+
 				/* Initialize projection info if first time for this table */
 				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
 					ExecInitUpdateProjection(node, resultRelInfo);
@@ -4093,6 +4148,7 @@ ExecModifyTable(PlanState *pstate)
 				oldSlot = resultRelInfo->ri_oldTupleSlot;
 				if (oldtuple != NULL)
 				{
+					Assert(!resultRelInfo->ri_needLockTagTuple);
 					/* Use the wholerow junk attr as the old tuple. */
 					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
 				}
@@ -4101,6 +4157,11 @@ ExecModifyTable(PlanState *pstate)
 					/* Fetch the most recent version of old tuple. */
 					Relation	relation = resultRelInfo->ri_RelationDesc;
 
+					if (resultRelInfo->ri_needLockTagTuple)
+					{
+						LockTuple(relation, tupleid, InplaceUpdateTupleLock);
+						tuplock = true;
+					}
 					if (!table_tuple_fetch_row_version(relation, tupleid,
 													   SnapshotAny,
 													   oldSlot))
@@ -4112,6 +4173,9 @@ ExecModifyTable(PlanState *pstate)
 				/* Now apply the update. */
 				slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
 								  slot, node->canSetTag);
+				if (tuplock)
+					UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
+								InplaceUpdateTupleLock);
 				break;
 
 			case CMD_DELETE:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 63efc55..16d2c96 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3768,6 +3768,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 {
 	RelFileNumber newrelfilenumber;
 	Relation	pg_class;
+	ItemPointerData otid;
 	HeapTuple	tuple;
 	Form_pg_class classform;
 	MultiXactId minmulti = InvalidMultiXactId;
@@ -3810,11 +3811,12 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	 */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
+	tuple = SearchSysCacheLockedCopy1(RELOID,
+									  ObjectIdGetDatum(RelationGetRelid(relation)));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "could not find tuple for relation %u",
 			 RelationGetRelid(relation));
+	otid = tuple->t_self;
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
 	/*
@@ -3934,9 +3936,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		classform->relminmxid = minmulti;
 		classform->relpersistence = persistence;
 
-		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+		CatalogTupleUpdate(pg_class, &otid, tuple);
 	}
 
+	UnlockTuple(pg_class, &otid, InplaceUpdateTupleLock);
 	heap_freetuple(tuple);
 
 	table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 3e03dfc..50c9440 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -30,7 +30,10 @@
 #include "catalog/pg_shseclabel_d.h"
 #include "common/int.h"
 #include "lib/qunique.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
+#include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -269,6 +272,98 @@ ReleaseSysCache(HeapTuple tuple)
 }
 
 /*
+ * SearchSysCacheLocked1
+ *
+ * Combine SearchSysCache1() with acquiring a LOCKTAG_TUPLE at mode
+ * InplaceUpdateTupleLock.  This is a tool for complying with the
+ * README.tuplock section "Locking to write inplace-updated tables".  After
+ * the caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock)
+ * and ReleaseSysCache().
+ *
+ * The returned tuple may be the subject of an uncommitted update, so this
+ * doesn't prevent the "tuple concurrently updated" error.
+ */
+HeapTuple
+SearchSysCacheLocked1(int cacheId,
+					  Datum key1)
+{
+	ItemPointerData tid;
+	LOCKTAG		tag;
+	Oid			dboid =
+		SysCache[cacheId]->cc_relisshared ? InvalidOid : MyDatabaseId;
+	Oid			reloid = cacheinfo[cacheId].reloid;
+
+	/*----------
+	 * Since inplace updates may happen just before our LockTuple(), we must
+	 * return content acquired after LockTuple() of the TID we return.  If we
+	 * just fetched twice instead of looping, the following sequence would
+	 * defeat our locking:
+	 *
+	 * GRANT:   SearchSysCache1() = TID (1,5)
+	 * GRANT:   LockTuple(pg_class, (1,5))
+	 * [no more inplace update of (1,5) until we release the lock]
+	 * CLUSTER: SearchSysCache1() = TID (1,5)
+	 * CLUSTER: heap_update() = TID (1,8)
+	 * CLUSTER: COMMIT
+	 * GRANT:   SearchSysCache1() = TID (1,8)
+	 * GRANT:   return (1,8) from SearchSysCacheLocked1()
+	 * VACUUM:  SearchSysCache1() = TID (1,8)
+	 * VACUUM:  LockTuple(pg_class, (1,8))  # two TIDs now locked for one rel
+	 * VACUUM:  inplace update
+	 * GRANT:   heap_update() = (1,9)  # lose inplace update
+	 *
+	 * In the happy case, this takes two fetches, one to determine the TID to
+	 * lock and another to get the content and confirm the TID didn't change.
+	 *
+	 * This is valid even if the row gets updated to a new TID, the old TID
+	 * becomes LP_UNUSED, and the row gets updated back to its old TID.  We'd
+	 * still hold the right LOCKTAG_TUPLE and a copy of the row captured after
+	 * the LOCKTAG_TUPLE.
+	 */
+	ItemPointerSetInvalid(&tid);
+	for (;;)
+	{
+		HeapTuple	tuple;
+		LOCKMODE	lockmode = InplaceUpdateTupleLock;
+
+		tuple = SearchSysCache1(cacheId, key1);
+		if (ItemPointerIsValid(&tid))
+		{
+			if (!HeapTupleIsValid(tuple))
+			{
+				LockRelease(&tag, lockmode, false);
+				return tuple;
+			}
+			if (ItemPointerEquals(&tid, &tuple->t_self))
+				return tuple;
+			LockRelease(&tag, lockmode, false);
+		}
+		else if (!HeapTupleIsValid(tuple))
+			return tuple;
+
+		tid = tuple->t_self;
+		ReleaseSysCache(tuple);
+		/* like: LockTuple(rel, &tid, lockmode) */
+		SET_LOCKTAG_TUPLE(tag, dboid, reloid,
+						  ItemPointerGetBlockNumber(&tid),
+						  ItemPointerGetOffsetNumber(&tid));
+		(void) LockAcquire(&tag, lockmode, false, false);
+
+		/*
+		 * If an inplace update just finished, ensure we process the syscache
+		 * inval.  XXX this is insufficient: the inplace updater may not yet
+		 * have reached AtEOXact_Inval().  See test at inplace-inval.spec.
+		 *
+		 * If a heap_update() call just released its LOCKTAG_TUPLE, we'll
+		 * probably find the old tuple and reach "tuple concurrently updated".
+		 * If that heap_update() aborts, our LOCKTAG_TUPLE blocks inplace
+		 * updates while our caller works.
+		 */
+		AcceptInvalidationMessages();
+	}
+}
+
+/*
  * SearchSysCacheCopy
  *
  * A convenience routine that does SearchSysCache and (if successful)
@@ -295,6 +390,28 @@ SearchSysCacheCopy(int cacheId,
 }
 
 /*
+ * SearchSysCacheLockedCopy1
+ *
+ * Meld SearchSysCacheLockedCopy1 with SearchSysCacheCopy().  After the
+ * caller's heap_update(), it should UnlockTuple(InplaceUpdateTupleLock) and
+ * heap_freetuple().
+ */
+HeapTuple
+SearchSysCacheLockedCopy1(int cacheId,
+						  Datum key1)
+{
+	HeapTuple	tuple,
+				newtuple;
+
+	tuple = SearchSysCacheLocked1(cacheId, key1);
+	if (!HeapTupleIsValid(tuple))
+		return tuple;
+	newtuple = heap_copytuple(tuple);
+	ReleaseSysCache(tuple);
+	return newtuple;
+}
+
+/*
  * SearchSysCacheExists
  *
  * A convenience routine that just probes to see if a tuple can be found.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index af7d8fd..b078a6e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -482,6 +482,9 @@ typedef struct ResultRelInfo
 	/* Have the projection and the slots above been initialized? */
 	bool		ri_projectNewInfoValid;
 
+	/* updates do LockTuple() before oldtup read; see README.tuplock */
+	bool		ri_needLockTagTuple;
+
 	/* triggers to be fired, if any */
 	TriggerDesc *ri_TrigDesc;
 
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 934ba84..810b297 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -47,6 +47,8 @@ typedef int LOCKMODE;
 
 #define MaxLockMode				8	/* highest standard lock mode */
 
+/* See README.tuplock section "Locking to write inplace-updated tables" */
+#define InplaceUpdateTupleLock ExclusiveLock
 
 /* WAL representation of an AccessExclusiveLock on a table */
 typedef struct xl_standby_lock
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 03a27dd..b541911 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -43,9 +43,14 @@ extern HeapTuple SearchSysCache4(int cacheId,
 
 extern void ReleaseSysCache(HeapTuple tuple);
 
+extern HeapTuple SearchSysCacheLocked1(int cacheId,
+									   Datum key1);
+
 /* convenience routines */
 extern HeapTuple SearchSysCacheCopy(int cacheId,
 									Datum key1, Datum key2, Datum key3, Datum key4);
+extern HeapTuple SearchSysCacheLockedCopy1(int cacheId,
+										   Datum key1);
 extern bool SearchSysCacheExists(int cacheId,
 								 Datum key1, Datum key2, Datum key3, Datum key4);
 extern Oid	GetSysCacheOid(int cacheId, AttrNumber oidcol,
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index fe26984..b5fe8b0 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -100,7 +100,7 @@ f
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
 step c2: COMMIT;
 
-starting permutation: b3 sfu3 b1 grant1 read2 as3 addk2 r3 c1 read2
+starting permutation: b3 sfu3 b1 grant1 read2 addk2 r3 c1 read2
 step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
 step sfu3: 
 	SELECT relhasindex FROM pg_class
@@ -124,7 +124,6 @@ relhasindex
 f          
 (1 row)
 
-step as3: LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE;
 step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step grant1: <... completed>
@@ -155,9 +154,11 @@ step b1: BEGIN;
 step grant1: 
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
  <waiting ...>
-step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
-step c2: COMMIT;
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step addk2: <... completed>
+ERROR:  deadlock detected
 step grant1: <... completed>
+step c2: COMMIT;
 step c1: COMMIT;
 step read2: 
 	SELECT relhasindex FROM pg_class
@@ -195,9 +196,8 @@ relhasindex
 f          
 (1 row)
 
-s4: WARNING:  got: tuple concurrently updated
-step revoke4: <... completed>
 step r3: ROLLBACK;
+step revoke4: <... completed>
 
 starting permutation: b1 drop1 b3 sfu3 revoke4 c1 r3
 step b1: BEGIN;
@@ -224,6 +224,6 @@ relhasindex
 -----------
 (0 rows)
 
-s4: WARNING:  got: tuple concurrently deleted
+s4: WARNING:  got: cache lookup failed for relation REDACTED
 step revoke4: <... completed>
 step r3: ROLLBACK;
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 3a74406..07307e6 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -194,7 +194,7 @@ step simplepartupdate_noroute {
 	update parttbl set b = 2 where c = 1 returning *;
 }
 
-# test system class updates
+# test system class LockTuple()
 
 step sys1	{
 	UPDATE pg_class SET reltuples = 123 WHERE oid = 'accounts'::regclass;
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index d07ed3b..2992c85 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -14,6 +14,7 @@ teardown
 
 # heap_update()
 session s1
+setup	{ SET deadlock_timeout = '100s'; }
 step b1	{ BEGIN; }
 step grant1	{
 	GRANT SELECT ON intra_grant_inplace TO PUBLIC;
@@ -25,6 +26,7 @@ step c1	{ COMMIT; }
 
 # inplace update
 session s2
+setup	{ SET deadlock_timeout = '10ms'; }
 step read2	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass;
@@ -48,7 +50,6 @@ step sfu3	{
 	SELECT relhasindex FROM pg_class
 	WHERE oid = 'intra_grant_inplace'::regclass FOR UPDATE;
 }
-step as3	{ LOCK TABLE intra_grant_inplace IN ACCESS SHARE MODE; }
 step r3	{ ROLLBACK; }
 
 # Additional heap_update()
@@ -74,8 +75,6 @@ step keyshr5	{
 teardown	{ ROLLBACK; }
 
 
-# XXX extant bugs: permutation comments refer to planned future LockTuple()
-
 permutation
 	b1
 	grant1
@@ -118,7 +117,6 @@ permutation
 	b1
 	grant1(r3)	# acquire LockTuple(), await sfu3 xmax
 	read2
-	as3			# XXX temporary until patch adds locking to addk2
 	addk2(c1)	# block in LockTuple() behind grant1
 	r3			# unblock grant1; addk2 now awaits grant1 xmax
 	c1
@@ -128,8 +126,8 @@ permutation
 	b2
 	sfnku2
 	b1
-	grant1(c2)		# acquire LockTuple(), await sfnku2 xmax
-	addk2			# block in LockTuple() behind grant1 = deadlock
+	grant1(addk2)	# acquire LockTuple(), await sfnku2 xmax
+	addk2(*)		# block in LockTuple() behind grant1 = deadlock
 	c2
 	c1
 	read2
@@ -140,7 +138,7 @@ permutation
 	grant1
 	b3
 	sfu3(c1)	# acquire LockTuple(), await grant1 xmax
-	revoke4(sfu3)	# block in LockTuple() behind sfu3
+	revoke4(r3)	# block in LockTuple() behind sfu3
 	c1
 	r3			# revoke4 unlocks old tuple and finds new

inplace125-no-exception-for-indexes-v10.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Require tuple locks for heap_update() of RELKIND_INDEX pg_class rows.
    
    [To be squashed into inplace120-locktag if we're keeping it]

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 818cd7f..d3f3e29 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -168,18 +168,17 @@ using locktags as follows.  While DDL code is the main audience, the executor
 follows these rules to make e.g. "MERGE INTO pg_class" safer.  Locking rules
 are per-catalog:
 
-  pg_class systable_inplace_update_begin() callers: before the call, acquire a
-  lock on the relation in mode ShareUpdateExclusiveLock or stricter.  If the
-  update targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX),
-  that lock must be on the table.  Locking the index rel is not necessary.
-  (This allows VACUUM to overwrite per-index pg_class while holding a lock on
-  the table alone.) systable_inplace_update_begin() acquires and releases
-  LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each
-  tuple it overwrites.
+  pg_class systable_inplace_update_begin() callers: if the pg_class row
+  pertains to an index (but not RELKIND_PARTITIONED_INDEX), no lock is
+  required.  Otherwise, before the call, acquire a lock on the relation in
+  mode ShareUpdateExclusiveLock or stricter.  systable_inplace_update_begin()
+  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
+  ExclusiveLock, on each tuple it overwrites.
 
-  pg_class heap_update() callers: before copying the tuple to modify, take a
-  lock on the tuple, a ShareUpdateExclusiveLock on the relation, or a
-  ShareRowExclusiveLock or stricter on the relation.
+  pg_class heap_update() callers: acquire a lock before copying the tuple to
+  modify.  If the pg_class row pertains to an index, lock the tuple.
+  Otherwise, lock the tuple, get a ShareUpdateExclusiveLock on the relation,
+  or get a ShareRowExclusiveLock or stricter on the relation.
 
   SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
   Most heap_update() callers already hold a suitable lock on the relation for
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d75f7a..d943dc8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4131,19 +4131,10 @@ check_lock_if_inplace_updateable_rel(Relation relation,
 					dbid = InvalidOid;
 				else
 					dbid = MyDatabaseId;
-
-				if (classForm->relkind == RELKIND_INDEX)
-				{
-					Relation	irel = index_open(relid, AccessShareLock);
-
-					SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
-					index_close(irel, AccessShareLock);
-				}
-				else
-					SET_LOCKTAG_RELATION(tag, dbid, relid);
-
-				if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
-					!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
+				SET_LOCKTAG_RELATION(tag, dbid, relid);
+				if (classForm->relkind == RELKIND_INDEX ||
+					(!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
+					 !LockHeldByMe(&tag, ShareRowExclusiveLock, true)))
 					elog(WARNING,
 						 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
 						 NameStr(classForm->relname),
@@ -4181,21 +4172,14 @@ check_inplace_rel_lock(HeapTuple oldtup)
 	Oid			dbid;
 	LOCKTAG		tag;
 
+	if (classForm->relkind == RELKIND_INDEX)
+		return;
+
 	if (IsSharedRelation(relid))
 		dbid = InvalidOid;
 	else
 		dbid = MyDatabaseId;
-
-	if (classForm->relkind == RELKIND_INDEX)
-	{
-		Relation	irel = index_open(relid, AccessShareLock);
-
-		SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
-		index_close(irel, AccessShareLock);
-	}
-	else
-		SET_LOCKTAG_RELATION(tag, dbid, relid);
-
+	SET_LOCKTAG_RELATION(tag, dbid, relid);
 	if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
 		elog(WARNING,
 			 "missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e4608b9..579cc0d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1558,6 +1558,8 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 				newClassTuple;
 	Form_pg_class oldClassForm,
 				newClassForm;
+	ItemPointerData oldClassTid,
+				newClassTid;
 	HeapTuple	oldIndexTuple,
 				newIndexTuple;
 	Form_pg_index oldIndexForm,
@@ -1569,6 +1571,10 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 
 	/*
 	 * Take a necessary lock on the old and new index before swapping them.
+	 * Since the caller holds session-level locks, this shouldn't deadlock.
+	 * The tuple locks come next, and deadlock is possible there.  There's no
+	 * good use case for altering the temporary index of a REINDEX
+	 * CONCURRENTLY, so don't put effort into avoiding said deadlock.
 	 */
 	oldClassRel = relation_open(oldIndexId, ShareUpdateExclusiveLock);
 	newClassRel = relation_open(newIndexId, ShareUpdateExclusiveLock);
@@ -1576,15 +1582,17 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	/* Now swap names and dependencies of those indexes */
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
 
-	oldClassTuple = SearchSysCacheCopy1(RELOID,
-										ObjectIdGetDatum(oldIndexId));
+	oldClassTuple = SearchSysCacheLockedCopy1(RELOID,
+											  ObjectIdGetDatum(oldIndexId));
 	if (!HeapTupleIsValid(oldClassTuple))
 		elog(ERROR, "could not find tuple for relation %u", oldIndexId);
-	newClassTuple = SearchSysCacheCopy1(RELOID,
-										ObjectIdGetDatum(newIndexId));
+	newClassTuple = SearchSysCacheLockedCopy1(RELOID,
+											  ObjectIdGetDatum(newIndexId));
 	if (!HeapTupleIsValid(newClassTuple))
 		elog(ERROR, "could not find tuple for relation %u", newIndexId);
 
+	oldClassTid = oldClassTuple->t_self;
+	newClassTid = newClassTuple->t_self;
 	oldClassForm = (Form_pg_class) GETSTRUCT(oldClassTuple);
 	newClassForm = (Form_pg_class) GETSTRUCT(newClassTuple);
 
@@ -1597,8 +1605,10 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	newClassForm->relispartition = oldClassForm->relispartition;
 	oldClassForm->relispartition = isPartition;
 
-	CatalogTupleUpdate(pg_class, &oldClassTuple->t_self, oldClassTuple);
-	CatalogTupleUpdate(pg_class, &newClassTuple->t_self, newClassTuple);
+	CatalogTupleUpdate(pg_class, &oldClassTid, oldClassTuple);
+	UnlockTuple(pg_class, &oldClassTid, InplaceUpdateTupleLock);
+	CatalogTupleUpdate(pg_class, &newClassTid, newClassTuple);
+	UnlockTuple(pg_class, &newClassTid, InplaceUpdateTupleLock);
 
 	heap_freetuple(oldClassTuple);
 	heap_freetuple(newClassTuple);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 78f9678..402dc49 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1074,20 +1074,28 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 				relfilenumber2;
 	RelFileNumber swaptemp;
 	char		swptmpchr;
+	ItemPointerData otid1,
+				otid2;
 	Oid			relam1,
 				relam2;
 
-	/* We need writable copies of both pg_class tuples. */
+	/*
+	 * We need writable copies of both pg_class tuples.  Since r2 is new in
+	 * this transaction, no other process should be getting the tuple lock for
+	 * that one.  Hence, order of tuple lock acquisition doesn't matter.
+	 */
 	relRelation = table_open(RelationRelationId, RowExclusiveLock);
 
-	reltup1 = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(r1));
+	reltup1 = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(r1));
 	if (!HeapTupleIsValid(reltup1))
 		elog(ERROR, "cache lookup failed for relation %u", r1);
+	otid1 = reltup1->t_self;
 	relform1 = (Form_pg_class) GETSTRUCT(reltup1);
 
-	reltup2 = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(r2));
+	reltup2 = SearchSysCacheLockedCopy1(RELOID, ObjectIdGetDatum(r2));
 	if (!HeapTupleIsValid(reltup2))
 		elog(ERROR, "cache lookup failed for relation %u", r2);
+	otid2 = reltup2->t_self;
 	relform2 = (Form_pg_class) GETSTRUCT(reltup2);
 
 	relfilenumber1 = relform1->relfilenode;
@@ -1252,10 +1260,8 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		CatalogIndexState indstate;
 
 		indstate = CatalogOpenIndexes(relRelation);
-		CatalogTupleUpdateWithInfo(relRelation, &reltup1->t_self, reltup1,
-								   indstate);
-		CatalogTupleUpdateWithInfo(relRelation, &reltup2->t_self, reltup2,
-								   indstate);
+		CatalogTupleUpdateWithInfo(relRelation, &otid1, reltup1, indstate);
+		CatalogTupleUpdateWithInfo(relRelation, &otid2, reltup2, indstate);
 		CatalogCloseIndexes(indstate);
 	}
 	else
@@ -1264,6 +1270,8 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		CacheInvalidateRelcacheByTuple(reltup1);
 		CacheInvalidateRelcacheByTuple(reltup2);
 	}
+	UnlockTuple(relRelation, &otid1, InplaceUpdateTupleLock);
+	UnlockTuple(relRelation, &otid2, InplaceUpdateTupleLock);
 
 	/*
 	 * Now that pg_class has been updated with its relevant information for
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7870e93..5ba228d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14302,7 +14302,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 	/* Get its pg_class tuple, too */
 	class_rel = table_open(RelationRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relationOid));
+	tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relationOid));
 	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", relationOid);
 	tuple_class = (Form_pg_class) GETSTRUCT(tuple);
@@ -14392,7 +14392,9 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 	 * If the new owner is the same as the existing owner, consider the
 	 * command to have succeeded.  This is for dump restoration purposes.
 	 */
-	if (tuple_class->relowner != newOwnerId)
+	if (tuple_class->relowner == newOwnerId)
+		UnlockTuple(class_rel, &tuple->t_self, InplaceUpdateTupleLock);
+	else
 	{
 		Datum		repl_val[Natts_pg_class];
 		bool		repl_null[Natts_pg_class];
@@ -14452,6 +14454,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 		newtuple = heap_modify_tuple(tuple, RelationGetDescr(class_rel), repl_val, repl_null, repl_repl);
 
 		CatalogTupleUpdate(class_rel, &newtuple->t_self, newtuple);
+		UnlockTuple(class_rel, &tuple->t_self, InplaceUpdateTupleLock);
 
 		heap_freetuple(newtuple);

#64

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#63)

Re: race condition in pg_class

On Thu, Sep 5, 2024 at 1:27 AM Noah Misch <noah@leadboat.com> wrote:

On Wed, Sep 04, 2024 at 09:00:32PM +0530, Nitin Motiani wrote:

How about this alternative then? The tuple length check
and the elog(ERROR) gets its own function. Something like
heap_inplace_update_validate or
heap_inplace_update_validate_tuple_length. So in that case, it would
look like this :

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update_validate/heap_inplace_update_precheck
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

This is starting to get complicated though so I don't have any issues
with just renaming the heap_inplace_update to
heap_inplace_update_and_unlock.

Complexity aside, I don't see the _precheck design qualifying as a modularity
improvement.

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.

v10 does the rename and adds that assertion. This question remains open:

Looks good. A couple of minor comments :
1. In the inplace110 commit message, there are still references to
heap_inplace_update. Should it be clarified that the function has been
renamed?
2. Should there be a comment above the ri_needLockTag definition in
execNodes.h that we are caching this value to avoid function calls to
IsInPlaceUpdateRelation for every tuple? Similar to how the comment
above ri_TrigFunctions mentions that it is cached lookup info.

On Thu, Aug 22, 2024 at 12:32:00AM -0700, Noah Misch wrote:

On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:

How many of those for RELKIND_INDEX vs tables? I'm thinking if we should
always require a tuple lock on indexes, if that would make a difference.

Three sites. See attached inplace125 patch. Is it a net improvement? If so,
I'll squash it into inplace120.

If nobody has an opinion, I'll discard inplace125. I feel it's not a net
improvement, but either way is fine with me.

Seems moderately simpler to me. But there is still special handling
for the RELKIND_INDEX. Just that instead of doing it in
systable_inplace_update_begin, we have a special case in heap_update.
So overall it's only a small improvement and I'm fine either way.

Thanks & Regards
Nitin Motiani
Google

#65

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#64)

Re: race condition in pg_class

On Thu, Sep 05, 2024 at 07:10:04PM +0530, Nitin Motiani wrote:

On Thu, Sep 5, 2024 at 1:27 AM Noah Misch <noah@leadboat.com> wrote:

On Wed, Sep 04, 2024 at 09:00:32PM +0530, Nitin Motiani wrote:

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.

v10 does the rename and adds that assertion. This question remains open:

Looks good. A couple of minor comments :
1. In the inplace110 commit message, there are still references to
heap_inplace_update. Should it be clarified that the function has been
renamed?

PGXN has only one caller of this function, so I think that wouldn't help
readers enough. If someone gets a compiler error about the old name, they'll
figure it out without commit log guidance. If a person doesn't get a compiler
error, they didn't need to read about the fact of the rename.

2. Should there be a comment above the ri_needLockTag definition in
execNodes.h that we are caching this value to avoid function calls to
IsInPlaceUpdateRelation for every tuple? Similar to how the comment
above ri_TrigFunctions mentions that it is cached lookup info.

Current comment:

/* updates do LockTuple() before oldtup read; see README.tuplock */
bool ri_needLockTagTuple;

Once the comment doesn't fit in one line, pgindent rules make it take a
minimum of four lines. I don't think words about avoiding function calls
would add enough value to justify the vertical space, because a person
starting to remove it would see where it's called. That's not to say the
addition would be negligent. If someone else were writing the patch and had
included that, I wouldn't be deleting the material.

#66

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#65)

Re: race condition in pg_class

On Fri, Sep 6, 2024 at 3:34 AM Noah Misch <noah@leadboat.com> wrote:

On Thu, Sep 05, 2024 at 07:10:04PM +0530, Nitin Motiani wrote:

On Thu, Sep 5, 2024 at 1:27 AM Noah Misch <noah@leadboat.com> wrote:

On Wed, Sep 04, 2024 at 09:00:32PM +0530, Nitin Motiani wrote:

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.

v10 does the rename and adds that assertion. This question remains open:

Looks good. A couple of minor comments :
1. In the inplace110 commit message, there are still references to
heap_inplace_update. Should it be clarified that the function has been
renamed?

PGXN has only one caller of this function, so I think that wouldn't help
readers enough. If someone gets a compiler error about the old name, they'll
figure it out without commit log guidance. If a person doesn't get a compiler
error, they didn't need to read about the fact of the rename.

2. Should there be a comment above the ri_needLockTag definition in
execNodes.h that we are caching this value to avoid function calls to
IsInPlaceUpdateRelation for every tuple? Similar to how the comment
above ri_TrigFunctions mentions that it is cached lookup info.

Current comment:

/* updates do LockTuple() before oldtup read; see README.tuplock */
bool ri_needLockTagTuple;

Once the comment doesn't fit in one line, pgindent rules make it take a
minimum of four lines. I don't think words about avoiding function calls
would add enough value to justify the vertical space, because a person
starting to remove it would see where it's called. That's not to say the
addition would be negligent. If someone else were writing the patch and had
included that, I wouldn't be deleting the material.

Thanks. I have no other comments.

#67

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#66)

Re: race condition in pg_class

On Fri, Sep 06, 2024 at 03:22:48PM +0530, Nitin Motiani wrote:

Thanks. I have no other comments.

https://commitfest.postgresql.org/49/5090/ remains in status="Needs review".
When someone moves it to status="Ready for Committer", I will commit
inplace090, inplace110, and inplace120 patches. If one of you is comfortable
with that, please modify the status.

#68

nitinmotiani@google.com

over 1 year ago

In reply to: Noah Misch (#67)

Re: race condition in pg_class

On Sat, Sep 7, 2024 at 12:25 AM Noah Misch <noah@leadboat.com> wrote:

On Fri, Sep 06, 2024 at 03:22:48PM +0530, Nitin Motiani wrote:

Thanks. I have no other comments.

https://commitfest.postgresql.org/49/5090/ remains in status="Needs review".
When someone moves it to status="Ready for Committer", I will commit
inplace090, inplace110, and inplace120 patches. If one of you is comfortable
with that, please modify the status.

Done.

#69

noah@leadboat.com

over 1 year ago

In reply to: Nitin Motiani (#68)

1 attachment(s)

Re: race condition in pg_class

On Mon, Sep 09, 2024 at 10:55:32AM +0530, Nitin Motiani wrote:

On Sat, Sep 7, 2024 at 12:25 AM Noah Misch <noah@leadboat.com> wrote:

https://commitfest.postgresql.org/49/5090/ remains in status="Needs review".
When someone moves it to status="Ready for Committer", I will commit
inplace090, inplace110, and inplace120 patches. If one of you is comfortable
with that, please modify the status.

Done.

FYI, here are the branch-specific patches. I plan to push these after the v17
release freeze lifts next week. Notes from the back-patch:

1. In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose a
concurrent inplace update. The v14+ fix relied on commit 86dc900 "Rework
planning and execution of UPDATE and DELETE", which moved the last fetch of
the pre-update tuple into nodeModifyTable.c. Fixing that was always optional.
I prefer leaving it unfixed in those two branches, as opposed to writing a fix
specific to those branches. Here's what I put in v13 and v12:

 		/*
+		 * We lack the infrastructure to follow rules in README.tuplock
+		 * section "Locking to write inplace-updated tables".  Specifically,
+		 * we lack infrastructure to lock tupleid before this file's
+		 * ExecProcNode() call fetches the tuple's old columns.  Just take a
+		 * lock that silences check_lock_if_inplace_updateable_rel().  This
+		 * doesn't actually protect inplace updates like those rules intend,
+		 * so we may lose an inplace update that overlaps a superuser running
+		 * "UPDATE pg_class" or "UPDATE pg_database".
+		 */
+#ifdef USE_ASSERT_CHECKING
+		if (IsInplaceUpdateRelation(resultRelationDesc))
+		{
+			lockedtid = *tupleid;
+			LockTuple(resultRelationDesc, &lockedtid, InplaceUpdateTupleLock);
+		}
+		else
+			ItemPointerSetInvalid(&lockedtid);
+#endif

2. The other area of tricky conflicts was the back-patch in
ExecMergeMatched(), from v17 to v16.

3. I've added inplace088-SetRelationTableSpace, a back-patch of refactoring
commits 4c9c359 and 2484329 to v13 and v12. Before those commits, we held the
modifiable copy of the relation's pg_class row throughout a
table_relation_copy_data(). Back-patching it avoids a needless long-duration
LOCKTAG_TUPLE, and it's better than implementing a novel way to avoid that.

Attachments:

inplace-thru-120.tar.gzapplication/x-tar-gzDownload

�<��f�[is�F��W�WL�Uiu��:Z�m��(��eS)��X��D1�����=<|l6�akU	%b==}>�3�����j$�8m�����?�9k��w�����;�y�:>=;j����g�����3q��e�'����x�?=�s���?�����V���S�QZ��n<�Z�H&����A
>=8��������	�������v�:;<9y&�������;Mfa�J�� �3q�ig&����|%�q(����-_����|��2�y�=	W&R���B&���H#<VBN�����<�T��"���b������
����������Nb������rNG8a��L��T��/�4��O/�$�|%��HfJh9W��~8m
�K�^1%�d�`�j/�
P�I������XH�S�0�&��A&��(���l�g]�i�l.W�{P�������Xx�/�p�^"&�������������m��w�M�~����}���3R������_�4�I�a0m�iC�M���R��R�lEq8��<���;�G���A��^���u�����oj�f�Q���t���������k��K�����������d0UKy�w�949WA"I�0�Dz��r��l�5�����6�e4VS/�"��]X%-j,�a����6�XB�z#��GC����u(*��i�1XdB�*�`EW��1�;b���)����*X�������L����0��������P���&.����W|Q2�=l'gF�W���'�^�R�l���~���'����|�yAa�@�i�+1K�H����B�L��������eTK��~��h�y8�0=>�=�6^N���q�x=~w8�~PN7�0<��{w��n�5��	%���7��Fc
'��:v�I�*p�%'�}�������'������7�f�u��������q��h|���Z����~��h�N��G�F�Z�O�O&���2/��s���'�l2�|�!~U��B�J�V�/E��0��1��������}?�������v;b1S��`�fty����ez�Y��kC8������B?��\k|��h����������Q�{��8EN(8���IcR�2�^��l�R��=g��J��kg��8�D�?�	�R?�<��rm�%E0������#��=O<��V�7
���}B"F����Kc�a��Ke���!o�����'#/�RH'F~,��B��=�:If)���+X��`����+����������=b��r�A@_�/��!��
?J?��F���;�$W����� �����xzFI��|�>���|�+�)�'�����Y���TI�e~����������g���S8^Y��?u���0	9U���Ga�9��=:2��rc<M�v�!�tN#��/��r��GQ�����j�9�#�!m� o�����~�5�qX��(����1r�HG�Iv�GU�!���u�5�&F��=)�w
-�/�
�bWB�HY��C��{�0��h{��%�rm����������8��sd��	y_Wv	rD���8��\�xq��th�����n�0�&�J3!F~�{���r��K�G��+`}�CQ����B�B1x�����M�
�@M�@����� �Wc�����&�B���9*<x��w	�l< X����0U�����\��e�-��A������}{�5����i�%p2)���$ETL���A����D�9j�R�����X�3��&�P���)�m�A�1�k�ll�QU@�l���|`(����QJ��0�^�H�O��1A��J�1 � V{���#S�h:7��LY�+��i����o���>V�^�"�j�)� �N��NX����p���e�����J�r�P���nB�Gc�����9W�$C���a���'aj2
��L�.Qb��E~m&b�� )�%�"	�)�	5�J&����{�~�o\��"��Z���b��4�PF���{�b�G`a��R���G+����`��OQ�4�(���v���:��LL�D�K�M��l�
!
d�����qT�*����t�����@��q"G.�V�p��Ht��6���U(ymeL�
Y�������([v9*L,�������z"�`G��l��vx�n:r|X����jY���X4/8���4X��w����e�(�� L2��[_��3�U(PTB>��C���P�t����l�39%C�,g��l��Pj+a��\��,m��zt�z��c��FT��mRm��U�1m4:�������t(�����Zp#r��B������`��7g�����p��U�E,����'�e1��y�����'3:�����fL<I���&2CL��$��zY_.�W-�|�+T�3��R���ED�]���{
�*Wi�D�`UT�V�V=)F
��Jfy�`�IMY����b��l���3��D
�7I���4m&����78�6��j/�~o�����z����f#&�4�H.����3����9�W������$���)$�'Q-��9���x��H56���t��Wn��L��f�u�pzg����,��f����	���a��.WM���H����jY���!�%�� ��hF%i��/jT=�\�G�����E�b�#��CjA�0�$3���yYd��,���*oi6nhZ��=Q���db��vW5Y	���V����H���0�d�������5�gQ�jis���*�4��E(6���"�:�iMA����J��J�b�i@r>�dx�2��$b�[�M��[E��+�=.���Z��ReB�)�A%�s��3���rmFq-�7�0)�,d��bAa2M�#$���&�dBq����F���rm�~��Ul�({��~H�2�5���(J��xc���.��`z>�C(������{�(��<��
�������,�^����(���9��%|�/���Z�I�p��?J�s+����������wo�v���������E����c�N�Q����������n M��.c������m��2�7P�6p=SxQ�4c��wI%:{j:"�3��n����D���gL���s�lY�����N*�9���`-�� �C�`�_���T(���f���l�������A]�S���.L_��V8�W�����YE���J��k��6���Jd{�����b���R�LF��|����t����]����	]E/�:������^��W���`t=��?��nz����-G�)Ts=qg���I��y�lF=/���w���Y��+�^[����_8��[�!}Ru[L_so:��)[�LL�0"���:�g����������B5�M$}�X4��u�+�^
MmL�?AR�r���o	qqN'��C����1S����$��u�L���V62s�
0P
��b.���'�U������T\�3~i*�,/�A���s�-J��:�������<��ZB�����������a
M�'�D��B`��R)��*������8o���\>U��6���RN���R��A����?��>'�
����K�'�us������j���(�K�����X[
K{��#m���L���:�R��1�1OL:��yz�y�b�!�Qa<'����2�R)<������b+e����[+UY&V(��Uw�e��I{��i�8�	��f�(������s�SU_����FF�����@-[��'Yq�K��y��s_3���Ia�i���b� ��Y3���?��7�vr��8�u��R�oI����K���o�Qy�"E�Vs9e� sw��z`�z�n
)|�	�?�����)E�R���\o������#A-��H2���|�'h��d1�Z��5%S^"��������P����
\��2���r��5-%��R/�}��%�7% PBc���q'�r�%�
���P�R�=oT-\����6�"��(�	b6����Dz��*���&/he��l;��-3mlRk�"�H@R�f�5U�2��-u�3(��������_�Z�s�I�|��+/{����^o��87\k��k��������~YPQ]n����gU�0���h����z�=v�&�Z`m�F�$�Qn��3$!�}�'�p�R)�'}�B
��AJy�����QT�EZ��jp������.f`��X�����r8����`�ptX*a�WA�	��<�2t�E�J���P[n��������
�������XM�_���0`�����
���8����a���\	���������@Qp�1*�[;d�Y������D�������G����_�8;�������&�O��D}�����$���^pm�O����������P�7����uF�����}q������A�w���m��il��3����@�����.Q���
n+{Yok�^��<L&S�GX���d���P��7�(�B+�+k�f���n��S�#?�Z��*������
���N�c��9��"r�|)���G�����������|�w<���1�|��_�|������i<G��G�P��m����9�?��kX��l8t����
������z.f�U0Mf�&B�����;_5S
"�~Hp��=�$���dk��V����2��Ba3/����$^K�����I	��N���]�!�T�day����o^�OE������z����Ep�� k��Eu�5Q��j��evX����E�W��R�L�M~�g�k�T�KJ������bG���q��Y��e"9�T�&�''c�u^�1���93�K����k�lW�&C5'�O���b�I F�:���c��8|:w�%��r�O��"t(��GD�����	�����Z�����3Q��Ol��uB��v:���d�k����K�����d�
o���r����	��l������#����Y����2O25p%��^�����K,�o��~��[5 �a�
����FC������0[��M�;��.������P������!��Dx�]�V{�P��`�A���-!
�kf�(��.��C�x��X�3���P�|���"62�0�b��]�^`����`t�(���5&��k��U������a��$��$\���A
*��C�T�������l��HB;|�.�,{�BD����|5v�67��:����'e�N��j2S�6���o�0����u�|MK�������*gr�-����0/���gw=�11!J���u�aC����2����m���$���o*��_\E���[x�����:x���1h�"�!w<����Y����p@���^��9��uD����
>~qm���x���#u�lN�&����g�#���u��i(?��l�:bd/���K��`mk��w\�Fh��z�S�9O}�Ms���8��m��{����4�C�(��o����P��=���������6vvrF���N^�[G�PN�384I����r+7K}����Dc��wJC����;��6k�p��&�R������'��� S��c��J���[71��?��$"���o^�\]�4�[6/O"'`���'���M�q�/t�Asy�bT�C	s�9k��H�+���4+�`����g6���(�Q��B~(��Y�����9S��1�����=�ss�C�����K8�P����5�9���M������;��`/��R���&H�s���_$��;�p���h{v���	������!����g����8�|a�~��|#�"46.�=�H��
��lg2~�4�
#�t�����=[UWoX(9w2o8���k9u����/�F��#��N��[�����&Ov�B�}�W^]����%��<"�u�����l�|dJG+�h`�u�B!t�������E�TS�����$)���S�u|���)�^�Z��
�4��zj^k���z,H
<�ct����?�����/�	��o5
	Y*d��)V�|a7������
(����A�9��k���[�D19�v���Y��}�J���u���{�/g���`��
�y��m���{I���{��St�VjBt�����<�s>��%�%����G�N�Jw����t�A2|,/Z�,���L�~��x+����';���4
@��g�5�Sg�6K����^4���C�)��Tb�
/>[~�*���w������_�m��v���q����
�:y�������
���w7�en�!j��E����5�)J�"5+��9w?M���U���0����B����c���cub�����a�.S+�(F�D�Q��s����NCl���:�|�����] b�w�
4�q
[V���G�!�����o�)�6�T��3�Z[���%�:��9�m�Qj�nXy�V+L�B�`'�������V��	y��Dh����p�OV,nZ���(Y��q�_z.�W����[%��&\9s77��$�fz���Y��Md2#y�\L�e��ie�
���9�{��'����MT{��D�q��������+SGrQ���7u�����)������@H��|�����V�����g>�m$��Xv�7��{���d`}V��/\M��M���6���*��f�r;����
<�!��u�����?��
N���Q�rl�,��m��%Z�;w/����G��p�%	��X?���p3Zir	Fj�`�Mk�c�z-5��$m�-b�\�[���Z��?�d��7M��V���Y�\����[�L��YF�j�����U
�T	4*�2�B'37�(D����=�j�+�t���;�����4�N�sH��K{!T�J�����-zLwJB�x�Jr�[�^�����jNk����Jw�-KJC��c�R�Fy�*x0��h)�$�+�9�.�D��2��U��<{D5m���H�������qh������V1�6y���x������z>��Q�l��z(�������+�n���~��
��10Ex�;)%4Nv#X�� ��#(���lHw��:|�G/u���b�,?�o�?K�E�x��	et��x}C��k��}dM�^N���.Ghu%o�)f����uF)1FC'F6�I3���rX���0�h��D�5��11�V�RQ7�>F|�s�cJtL
~����,��Q!es���f���^��=���='�v�|�?�D��5������lVu������j��MF����/`i�I�W�}T	�wQ�N�om;T�A�g-�}�0�wJ�����j�����Q�)L����n�q�����h��N����=��/��'[b�Cc,=���8�S�0���wV�Myd����kL���:UD'�� m$7��SR��n6�����+i���&q�p��uJ�EKU��7��}�}0wz�������������~��^+�o�k�i����c���i��.��%H�>��\����G;9�^,z��������N���%bz���D&Iz'S/}{�1 �_t��#s ���3O���Y;w����-���7'�,A�����a��3��*�"SE}�����zrc^S���E���.�{����ws"SW�����)�-M�����������H���_�*�:q4P��3d�	wG�I���L�������t5��Za|92��	lu�:�Z��������a�qZH��������v�}�u�����Z����0#�|�+������Ho�K<Iq�_}��x����P@_z#�g�B�k#w������n��i�&^�p����K�����0�z]�_�
yap�������Oo�p�(P���D�z���|@"F%��z#�J*-~`!�����s��c�pS�:�A��pE���l�w��2�d}�����c���a����"���[��[�;0�+{�!Q����%3�D������t����=����X�X-�t�#"��b4�d��>_�G@S4�a������_��Q�5=����.�n��}�x���������k���"DWS���x,���Y��� ���y�+��*f���i��������&p��1������%���o������M�C���j��n����l(��IA�N�A��es"�2HH%�Xt���pr�JC�w�[���a��M�����LX����	�3�d8�8��f��L�D�8�1_4�Hc�(��!�na�s�Y|�u�}�s�����cQ],����XT���:���Q����,
������g���n�A����� �E��� �\^\�
�7&���5��|�_K�����J�����2��������u0y������?��	>�VsY���>a�,zM��B�#2	N��R���
m9�����yn���/�4S������������x:�"� 3<�����������P
���T3m�p���5����8�-$�tGk��tS���N����*�v(����R������.��d7#�'�J�FI�]�MJ����:6��Y#�"���g�8��`����R�������tJ��B�1Q!�t�A@�"��5���+��T����l�
J��:x���a@J!��:�	�)��*���*!���e�Xj��d�6Jd�{�,��R�g�.Rk-K���������>@�i'�o���FN �2p*���8�P���Q���>�3SbM���>�6�I���<��i�v����C��s{�p�z>����e@�m����]v�`6i�1����o��j�d������u�Rh�h���E��K��f�d��>,
+0�g��;�Xe����w	
�!1�/�Ij�)c��NN�<Y8E���5237}�
<i��^��g.{�����K�G�r�<�I�Y���!�1�,��\wYD��GC_�GW,P���$���^��8l7;��u��LkD����N��7,X�������I4��a�xZ-�N�m�0��-7]p;�V+zAXc�b�L�i�����e�0�':8���}����^������l)����|���A<f�1�6t.|�Vb ���`����'�|@��gL>	���m����:���0(ZqI��|���7����|+�:�g.p0��������i��ls����<�"������+w�b
5/1�����y����J%���q�C��XZ��dPL/B�a���W�<��c�/�11����:��!�aM4�6�"�i�V����a0��l�
Z'���^�JRaXJ�D3�0�������>@��4���-*Aa�dq2}Bc��t*]�c����*m)V��(s[�r[)r�dl��r���Z�F�X��p�l1���F+py��$mT���K'�]'s��r��9�.����%�b�O��T��%IX��\)���q������6�7f�q	�y��}9�K]X�OF�A�1nu']������2xA
'j�k]U�_\�+���JO�D��q#h��]�����1����tc�3����B�zy
q��r��@r��5�T��Z*NI-m�J-���Z�"7��IO�r	��N�����:o_#��DF	2�������[��6�#
�O��[}��G�f�1:�:�^o�I2�%�����2�r��Y2��|���=d����,b�9�K�y������ ����&\M���K�0�,�����={"jQ��4���S4[�a�����C�v��,��D���!l���YZ���A��/�NC2M+%%\C���)O��z!��`��
��^�^ ��$���=����r<��\����p�PX���Hum�F
T,GO|P���[���6�!��rlQO�:>�Ox����o�de����s$\��z>������w�����U�����jt��K%�"����M���~A(kp�7W��R��5_�?h���h|'����M�\\��n����B�P�M���P���g0��H����U_IHH�Z[�6���K�� 7��kD�z���x.#�A|�f6c��
�RC��6
�����B84����N��� ����O�
O:�<�M�����J��U��Ob�^v��Q������{�r��-�sN����u�c�������o�:�u�~�It:�p��/
z�RD{jbm�������y����n�WR�^�'N3wu|z��]
����B�	�cs�>����=������Z�����,aC@��orF�������>�� _{Q���f�[������L�+�LlHC�m��=.F){~��������������O�qK:�����TR��O�&��sE���-u�(�0�U�3����d�a��?�U��������@[V@
�/��QB����Qse�A����#|bXc����d����P^p��0��������==�<��sH9)�����������Q���_�_U����{]��8��N��,�n������(��n���1�,J]�E�{��	���������R�u�Q�+���|�{��?e��o���a���}�=z��?u���>@7TG�$
'���/{V�Ng12�fE��?�!-���`�Z{�vZ��5�R�Qy�c��[wa'�����n��%H��U�����B�:<0�������SD�{rwR��TC*/��E���w��O�F�i�J|P�a�R�Z��$��)����-�i�~"w��%��R�'��a���Z-�SD7$���E�x�f�����	OK����V\'���������[����
��24�H�{��)8���[��jdK���N��wZ�I�������r�}O�:u�w{���l��n:�(��:G	�d��o����u��p���5K����`��q|� ��v����N^�#�b/������|��~f�h��g{��@����^����ml[�����-&�^"po�������6��^���_�W�yIzK��r���r���$����%U��������G6DXf��
A������'��1&�d�W
W���T�^R��f�|���y���B_��a}�U��f���7����T��!���
�x�������������.x>�v������
�{p?������F<��$���~	Bc�HK�U�g��]���$S��Z��Ra������E���T��f��tV��U<�U���Y�������5��"�1]��������YU`)�.���4B���?{���\��\�K8��$��lK�*�d����f-���|��o�?���j��f);hN)��oa�)R������=n�o��4E��L���o��o��/�tv�XjO~|�����?���;�F���m��]��9o���y��Z��^mKf~��P�t�?����*\t��
���>��P������w���r�����	'/��9�_r��&�O��C�^�����������:9�>9�`�z��R����Qf���'�a&�pTU��+�B�#���w���K]V�3�G��]���"��cM���R������������?��/�1�-�ERZ��){��mK�I|@)���>a���A79��Q2;���idX�������@,�����V���]�="����A";�L&���d_���1|��]h>��L��E���t����os�K��4US��n�
�.���������a5���!��<�h�t
�PY���H�b�7Mb��5�Q�.)��TZ�|��r�lz�x���}�Z\�
��(��~6���%��$���+�0�����A<�z��
����;���d�,�q$p��k���;�������5,�1��(���/{]�%��#�*������Z����
���6/.��f-�[���K�yU�����ur�_'g�:)AC!�x����TW�����/�X5*����F�MM�3����5I�E�`��c'��%L3����������_�Vf=����W���B����^���U)}���a3��e�5
��d��u|���X9*���w���1C=����>�S�����HQv��7S������Q����< :�g�*��mkH��B��'����i��I�������mf���������tk�K�����M,%~`�Y�v]�*������+��
���O	yM�k�
�P;{��h\C��=|.1�^��n��|���h�sy��N��U���Q��m���4(&����Z��f;���o�����w�r:��V�����&��7�R�/:���Z���j1_�n����?���]�&�w�|
��������o������NH�����A�������z
�7�)��1�Y+jP�D7D�i��D�d�UE�E��/hW� "="]I�
eSG!��S	k7�������zq
`�cc���)�zH�7��"V�0�25��A���k��xj�	u��Mr��=�H��r��;�9�~H�m��@������	a�����D��`�>���U��>�.k�j9���}�
~C���W���)!�������g
�B�_#x���id����}{Z���0Z.�i��(�>ZM&��y-�=��rRr�����5�i�����G���.�b����9l!����Y�����i�eM=���H�����v��3��f�Q"�<jz��>:���� =S�.C�=o|�lv{��y��3:8t����w{N�o��vk��Z�������.���!� Q^f�+���),��n��p�
}��]�y���Q]�������]L�]e���N����9��a������v���O�[��4���ry�|��%�a�g�l5[m���o9��N���X`���?��-X��b�NL�%�:����)�h��bD��Q��'���s0��'���k���w�i��zF���:�;�L��J�	1	��j7B�`�{2�8�#l5���=���+�����w������O�t�/����g7|�� |d�{��8FWL��T��E����y�:��A4�~���F�����#��j��>;���B��tG��P�-��C���m�#��y�;�!�WM�w�Y��4���0sG'ii�U�V��_�8=�������� ����s����!���2b��p�����P�� ��O%�c�N�����(1�HS��oX/5Qx����o���e|���q�U
��W��)Dr�����/�sM�$j�`��%�
������;6�
_pw��="|'^��]yS�i�?���U���N�>I�hh��m�g|�fG��#��wfo�^��� u��#���V����[i�����P{��5�#������s�����`|j�ix#1&
�����Q�����B^���4�8�7���!�Z�����n�@��&��F��8gG�&n��F"G�)�Va���>b�nUfN� qu.��l����Yb���V<4�`o~��q?d��D
���
����,K��V�����qU�����~�pa|�W*M�K��H�N�(l(EKo	�M}��TW9M9"��k(&Ib5$!���Hc���~T&��9��|����.��
��� �
+�bR��}<-�����Ql���
Y�����f�\@)�o�\DE�`�IR�0�O����c�BcQf|z�k�\pS����Ma�p��2��Y@�f����������X<�6&��U4���a�7�n���V���/�Uy��/?D��Q��%Rk�1�m�TB��Q��>��h�C�5�\���yYXC�	6������c���B���~pC
L�����^[_&��{���N��;���]HiY�0������T�b�>X���0x�����`]����g����!���B�`����'<
^^�>��v���=��$^����9I�k;Y/��f�������Te]y�L��R���)h�d����M>Qqtj�J�����	��b�H^?�-W�K����*����`�����@������2PW���2�����F�[H;`o '�������,��6����������8� $2l�!�a��hE�S��0�[��z:�I�����mx�������K@�t��q:I]�t[�"l�!��@�P�8`�S/�}z��#��~;����+J#t�=����K�QT�^`�S�<m���I�8�_y �wBs�����E��)C�3iG�c�1��N�d<���;�I
������
������T�����>qt�P�[� ����5�wi�p��h�"�m�hi6�]|��D�/n�wh)xw��|#��B���w�P�xA/;5,��Dt�h�/���/���;����AN��$���B�D�� �S����|�*)�v1V>+:�����V&����� *���i����:��>>A}x}s|���b>��<�-?���Z���1	jtqGP�����UU|�9.mux-��&�@9JU��1����Bw	b&�#LG�b�K�B�G����.�(P��WPo�/�Y������P��$Xg�D_21���������u$���g��DBD�Q��2'a0�>R��H`��miZyF��2������D�r�,������B�)�6�j{5vm��"i�_�^3�li,�]��B�)AWN82�e�*�w�t�$��bK�V�<�J���ab�en������z������m�5�Q�z�!�7�����
��)C��q�+k�^N&������"�#��|�x�����
Y&U��Z�%9��_����>���w�5>�hB,z�o����;�;�������������|_
��J
�����f���\{HVJ2=����x���b����l�������*��]�K{qi��kc�&v����B����o�6y��h\�i��o�x��Lht5uLr8�\@����#���Ld���)��
uZ�~�R��;��qYl����!R{%6���(^����6.�����?�>���p�
��T]�����`�aW���+�,R����=�}�_�T�r����6�����#F�2|��U�Y����^������q#�O��,���m����Xi&�
c�vqz��m�(}��W�'\SUv�R���K��S�'fKxLT�#�J���z�����Ob�ERN����NZ�)�j��/!�lV�X y
c�5��\=����|3����y��8�'������1�_����wg�1���/�g73Y����5g�9��)���@�����E����^�6|B����!E�1�E�\���w�j(u���j9]��O}Y?�����M�����Z�1j����d�}���_q�8+7��H9�r&��tIdS�z��)V�b6��zRR��u�Rl���b��oU��Ny@�w_�EI^���E�h����������)&g��P/�=,[��]�0"����v����X,,*W�����X���$9r��p��=r���Z�mS��S����CHZ����_Sp$�q�����!�y���*rw\��
�8�#��^(����X.IM�b0Hv��$��0���WX�]�W
c
������j"���� �h0L�N�������m
�L�X���X��s����#�j��{���DV��	�8a�]�V*7�>:����;hn���[���X�n��KlU�MA����B	�[Fz��tg��������T��T��"K)��S�����h����I�#&��s����Y�6/@�Z�PU���q?�N�a��-����xS1��������_��k)I_-�H������R���������	�}7��H���j.�D]��=�y��F���{����rn'�.�d3�F�u	�~q6�.5��_J��"�5&:��W�?��0�'�1�f�%I9�H��)r�R�����:������=�3`' {S������)���J�W��$fG�S��y��a�d<��k��[�:����f?��*�(�10�|�$��
�Pw�1��%d�/�zI�3E�I�������3��W�����O*9Jg�glO��5�+4���S���,�;.nKw<�}��O}�#[�h�9���f��ZSu<�AN��t:]dZ�]bY�Ctx��~�� ����
A^zI���`������7Z�{����o����~�A}����)��h5��y4>\����C:���������n4���H��h9_R=��:�t����'�l��^���x���#J�LE�d��J��F��Z��l��*m.����J%������$sq���
���[��XI�����f�G����}/��(65�"s1��K%~h����T���h�����Dl,�g������������eM1�D!������
.����8�:�.���k��gdkk9�NM5k������x��;,�m��4�{��W�;Ubt2��/�<�3DQ�-0o�I���@h���X��-��\L���d���m�Bn��0�`�5�WJ����9���_2�3M"Y����m1�QO�^%>���?_�����28��d�u��8��#��Q�7���kb,��f*�z]k����j�����h�P�>�|R������x]���!���� �q�����������,��=K q���T9]$�A${��B��
�����
��]�t�I�c��|��!sB`Y��^(M*R��@^q
��Z+~2�������.A����.��e�N�����^������|�R�����cJ*��Y�J�~��0\8�D]��L�2����P�p��>K7w�A��s�a�Z�2f,���� ~TkP��"Q��_�:��Jw�G��R���V�����1q�0	%�oW��5��L&\�����eY����<l4\��q��k�Jn/9�%����\TO�B�����2���!]Y������ +7�sdH��-���!�B�bM���'�m����
-�+�B��D��R��������s���?�������C�����U�8�F����i&"ShvR!�" ��L?s	���NY���i1�*/������R���?i�[]���I�:��C
�������������\��[���J��N����2N_�4��g�ai��4Y{G�����F����[X�r��-�W�%�4E�iJM�9pI�:�D�Q�l��Yt?�)�GP�p)�'�M&r���i���y4����8f0��F/�Z��W���Z6c����7�u#���r�!,���x�J����
a����3PY������$�zf��%�-{F��+�5��5N���FK�f<����"�����!�u�],�>}�>��;m^�NGH�:0���|0����Y��lq
��z�>&O�����>f_9G`����!����Y�s���X�z�@��������A	T�5>8�5>�M�X����������$�Y^�S��`E*�|����y�:����]6eWbR��m�������?	�����r��VT��-�s���C�7�;y���A�x�
_�dE�6z��O��_>���^�x��[�����0w�J�����\��-�[�e��'�M��3��B�z>����^��Is�������gg�H���L�tpr�DK~���`�^��n��E�N����zs���������������-M�Tb{����RF('[��1�'}�er��F��kI1_�'����g5Y-X%���u�5�s�K�����-{�ivIi���Y�-�d) a������������EV�����
�����_�O��v��x���7)���l}�(��,�)R�:��X��\��\��oz�l<�o��x���K����#��`=�� ������o�N�����������9�{�\�<LR%�3	�������M�O������W��������'��6=�����2K���y�[��������TGOJ���#��mvk�����5'p�FD���&�b�V�������;q��F���<M�qaGkL\�����b0�"z�&�!jx�Fg�C���n�&X��
[q�-��1�k�\�����n;����91�	�~5B���
��c&����:�Y����t4oL>���f�9�i�<!|;G2�����L~�MO��|W�6�+�%�������c�I��p=�"�x�eKV�����h����ba��[�~exm��A�Y,d����!6���+��Y<	VH���8$���\�����eMH"-��r�*���[�5bP��:���DG��[^�����	hz[�3��5��nF����>I���2	*B�Vd��e��N���H,�	���R��B#|!����U�����q�>�..EGfk���o���s������7$��}�%T��6�����U��V�$��������W�	^Pd@�1� #�g�9��#��qwyL`��N��?�V9��V��&N����#g] ]^'k������.+�[K-�X�`�k4T'<�5>����KQd��b�"�@E�7����(�7��S!������k:����Q��A��L��8�:v�ca���C���S��&	eC	�[�%[t��d�W
����������))	(W�LN���(���A)N�SG����O�<�&�	O�S>_���^<3���
9�&�0E=�u"����BRFJ6[��i��+���V��W��#�Y�������c� k�E�b������0�E��H��?r"[�k.�n�O`Bk��_b�A=����|����5����)�w�FR���[�<�T�����zH����U�����e���\�'�+��q3�*�b+���HX�g��|���*���.X�<L��s�a	$��	�N��/p���aj2
���6o~��7��D4���UqG����"+�0��5efV�`=���z"ed�4}��8��%n��M�'�5{e��n��|U��-���7���%aF��Ea��z���C|�t��<��1�PM_rH�b�d����N*&�K�������VD��uZ�R���_~�

l������o����������'�r�x����-�k���� �'/�����Y�X73���hp��b�&�����%'w�!0���q�L�:�y��n�Q�-����1���1��e��%_��m%����A��9'B��l5a�(l���L��mtj'%N�������\(�B��s�f�*������v��k����`�c��+�i��[�J ;'k'��*
�,�1�zNA��uy�.�j1����g)]Y�G��rM�!�+��c�|EY�i�7MEo���3�nI�������i�G���8/���9
�F�7��Ov�f��8���f��QS)��x3F��O��T��I��b��;���b2���wntJ���������VGNR��eNR�%�|���t�5��g:�l
U��N�*X��&���eN��/��g�dL����C���sw���rZ�-�u�;�I�1iv�^sTlK��#kJ�k�p�B��MQyB��\2���x%�Eh���N�^V<��ViL��� l��$��o��Q��JpS^L��Pa�;��>e5g����XG�n���(��yc��5���
�i�k��s����M����GE����|Ah@�����2��|"������������f��������j>Xx�1�n��m3@����b0��i �u5�uv}�v���f,tl���
��$��L��(��YB�J01O�.n1a]�����K��*������x|��a!7)�qMBL�`
�0�G�����2��Ao<7��}�5�1k���9��	���@�U��������^���w=��f��X�
��"��rDrp4��A�h�s$�������AjY���
���
M�?�a�I���tRb-sc�_�����1�W������~�#/c'W��&l�2�H���ES�{�%im�\9
bM*�����s�����Gs�9E��&ZL���B��Ig|�l4��t[�-(3����6�P'�x��I�b���@A�@��5��R�(�\�����*"��a&H{���I�aX�g�W����3�A�~7s^a�L�;�a�$������uO5����v ��X��Z���2^�6��8�ObR�m�k�;q���X$%�dB`���*�?�e2!W��3��V�������xT��e�"%���5��5j�K���\�ky��x�9>������hqDs�u���Y<Rcs�����#z����z`���KN��1�k�:��~�����Y1��n�J�"�WU���Y�uj�n}�m�*^�������~^���T�k�-�&�:�������k������c�W�	���@�
��0����Yr�6�Zp�
] \.x�r�#���Q���4�.b�kI��JBQ.�c�V�#��4����]^����7g��N��J�R�
���P�k������C2\�����L'	����$�!�aO�W�����t��0w�3��U\�\��P���2�P���;n���J���V�
.��~:g�f�n�
�s�K�O��0�X�@�;`_�~x���TZ��x&����|$�-F)����vb8|/^�q��w&��y��e�U/�Xi(*A���������O�V�-������MN�8t��v��/$�=�G���@������v���Bg����Ey/����?�g�(S�A� V���Q2Ok���[�=���]����l9����>��aTnt����va�����Q���
�jT���c�����T�q��$f��n�
�_z��J
9�V{���R��$�Pt�r2�%2V��O�g��~������-b��
N��!@5���TNw�FC�K�7����r-E���ba���SwL
:FF�&��#_�������������rM���x��V�/@^	�"=8�q���_m$2l�$*�Mn�����
�AV��7]�������lw��������b�*�J��C
j�H�����9��-Ka���H�E7N[@���T��*�T�����@�{����G�; �g1�T�
���j��F�{1�
�1�	��/=�/}#��*�>h.E��@E�������+�L��{w'%pei�f�/�<3���p��
\*)����
���/�N!x2��fZ�\��1�������D������6�iGjh��)n�Q�����:wK�jID�m������~��W�=2���^�
�s���e����es��2�_�v��[O���x@�\���e��n���wv<��������"���g�qu��t��pQA��g(J��#@41�B������c�Jn~F��6t2��wQO���u��I�x���I�'B���vK*�-�6wyA��<�
A�~���������>w�i��<�z1���"�����DP^�N*���k9��n!�Rw��|�O�?i�0��F��K�����W'}�SKb���}�60q��X��"��S���\�T��i��*�r�d��A�}n.��|�5���mx��{�Z-M���N)�!	����k�����)Z���)���Y���V�J�,��i�p��&��OiH2�^���������m�"io������%��������\����n�u�]�f
�YC�|?q���
�3���A�R8����S�R�#
�JD
���lJ6R�jbfm����$�]�(����QHx�BM�r
����J{��T��'m�����U�%�<��&jD�u���le�9�A�ecV}9	4!�E���HF���d���2>�X�f2~ ����.n�h����.Kf�y����v7v�mp��������e��8bn ���/(�a
[���0��6a/=l�d�:=�suL.AYN��R�@���1��&�^������6�?��|y��@��9�a_K;{'���" 1U�j<���(;d��8AV���f���O��{e����:(������=�dG�|��!���IFF,��U��,>#�{^�y�v"C��)A -d��-�Td���m��v�`�.������T
�o�h������\��bg���
�����5�5�U�[��5��0�+T��@|TU��(8���R�DT�7�!��ht�h�A0����)
��FR4����� �[��x�`�/(Cpb2�jwS��=m��c
�{eg'J*��J5�����#���Nl3U�����������+�Hq�Fy��0��_��?/+7V{3Y�9�wK���$�M6X����qL�1��1f��d�f����%�z�!�Wpe�J�>l�D>����B��h?��5���6�������#�����)�������[X���<�.s�V�8z�6��������l{��l�;�5�m����VC6���������A�a������������� ^�L}1��^��IR5��S�D�km!=�9EF����L9�[ �����`N@�j���:��U��U�)JJ�I}���BS�$��0�$��X�xP�`u�+7�.),�:�'y+/�UK��E����H/�z[4(�&��1�Rg(��y��(Y����b	<�Oq������^�J\P���*P�6�-������S�<�0���Z���%��$U������h��	��zx�}���Ke��!��J�������pV���r����>���TUNK%�����A�`o�m�W�b����"BS��P���i��W������q��D�S
��N���:p3�hF�]#�)�B�<�F�8Tg{�`������Y��4X	
~������������0�1���z�0��R4,��E@��#�U�DZ����);�T9�/�����/n�PO��o���S�VRm�r�1j'
���?�\U85���3�B�����`������6�2m��hM��t���������?HO_���b�G����~8>y���h�w��K|P�(�5�C�OF�/A��&1�4�WrR�I������BE�����K��y��1U}���	 �e���DM�:��x7�K��g`k�8����_����~��zq���58}��H��E�Uw4A !�7� �mA��i���	����x�?5Ef�]Q:��ug�8�?���A���g\@�����-��#.B�(���{ 8�G�3�.�l:Z��h����W& /������i�����*|��$��Pq�������3_�5��kC������V���A��9j(�6�%�i��GS|D�I�����i�A��vj��R/�U����d�K������Y%d�6��dE�����a���4���{�h�j]5z�����2�q%:F��������������v�K�/��b�a����,��sc:�����#��������������t<vtD%_���
����~(J=�]�����K4��Hb.&����������
�
���\�[����'��G�'U����f6�N`�#}�h`�����X�
��x8���=����`m#%�YJ'Z�z���X����� ����23�~i����{�W��W1��*%��� S������QP��Z�
cI>J��5�#���H�%��������_���7�WdJ2
��S�����WV�Jct�/�����)cDeK-��d9�D�*����i\����&�w�l��i�����K9i�������5�b��c��h����G��h�}J�������g������%\FG4����K��:�����?���s�����]�Cl�)nC�:T���/X���������yy���cW����UI0��G!�x^��`�����u���{��� ��i���4T[�mI�(A���'�Q3�;:'7wGS���j%c�Vfun�^gc��w�Jf�;�@�����X��i�Z������Ev�s��S�������(�J����S<��'���<�W�i�C#]�=�~��*��o_i���<��H>�T	�+�{���bQ1������������O�O�!�r�HA.����	�6#��F�m��}�k4F��s�8�[�� ���&���W�v��?�i�M���$g��n�=���
�t~7�a��K<���Dg�#�����nxY�L`�����e������P��Y���7==�$���
F�v�~�@!�o���+@f^���-�Z��Q�~9����!ah���-�z���n�n	�����r�[�Cd����?�8;v�7>8�>��5��_��dX��\���:R�g7�+us�p�fH���r|z��]
���������*��0niK����p�+�Y1;R�����F����[���<��X>����I���}�^��\���� �������A��O�L��#��]���'7�+�������s�K,�3ZQN��w�zOM���Se�3^\�^�����A�������������s�L������6�����suyv����O���G��CNH�;AkD���t;�����j���%#��e�]����z������O
����Z��j�/e����U����IB9��nX�j���'N�"������P������)W���a.��7

����p�����h���|(�38��{����8B{/C�\����s�o�r��rW �a	8���� ����'���������<�ai#wV���:�����WKas��w�;�f��h����y�����z+~DJu�uI���V���&��/�������.G3���������� �������l�����$
H�9_Z* S.��1r(L
��������%�s�i���o����[���"E�����xm��yr��=/zF6������F�ux�t����^7�~�sD��&�i��&MY�>W��8�h�*��V^��!�S��3��y�{/��2�0�1������c��?=�p����l�:�Y����fs{tG�p7�������$�m2x�Mf�F��
���\^)>At:�XK��*^ ��pS��'���=���/����f
�Ktvq����"CA��70>5Z�FG�����e����]���0
d�B/_���-e=��*������sg�Z���+��uHB2��E��/�Os��x��]/���J<#�
B7|��=�9�d��r���=���������z���;tA�\������[���D	@77�2�*�t,��B��*e8�YP4Y|X�xQ�zQ��Ji���c��:����yNXq[�.���B�`�F����B����RX�/��Pf��1�%�9�Tf;FmZ�U[�p�x��R$����M��m�X����=����E���LI��"4������?_�G[G�w�
aZ_������,��G�:�q������^�����n�7�E�~�tz��v����o;��T�7�o�g���R_-���m��_��c����Z]U~�?>��_~y��W���W�3���5]������.�H���;u>����������(p�
x�;iz
W���Y�Y�^9��i�:G��j5[Uo�7�{'���z�����9=��=����U��0��r����K�
�=����8�`�i��f�u`��E����;��b_z5~^z��������?�R�%���J�P����q�K�����';'	8k���%�}��|SWE�*S�,���0\�3B�3�}#��.P�ws�0��`f��@58%�#e�A�Y�C����Z)(7��\�p�a��]�j,n��jA�?)���n��������j[Pp"�u>�\��S���:�ip��/LL
."�8��]��N��+������e��wu�����%�vn�C��t�XI��"H{����Ir����B��,D�����&�������G�������>��$�EM0Xpd��y�q������5&j���u�����ui��
N����s���M������T�2�l
�1�Qo���Su6],���nD�x�Y��3��Kg�!����X��u���������� Z��
?z9yr|~_]����_�����v�i�}��o?-����}��������]uo������s���-�6!~�F�����%;���+�k����BR�q���nX.����)�qmZ�>����7��|x��Y��q��[>�zK�'#[�L�T^�����?��D��g�?�J*�����T5�S0�������{v�^��P7���A=�,HvS����������y���Y��Gw�Z��^�n����]Y>^���y���Poe�Y������}|��hl��{#��	5z���I~1E�<\h�����MF�(Y���v�l�^����Er_���9��4��\1�
��a�
��A��W�������l(�^����F��F-L(L�Z�U6�kMc�L�km�L����D�f{��&��v������$X�A62F�p����x����n����OC�7D��"���zx���f����hPx��]Wp���1#�`��/��j}��1H���Atai���Q�(�Q��I,�$�>q�^��\�
��

1gb�$���"0���r��p�l����t�@y	3����.N>-=����y�����`N<��"��<
KtKt�������#C�l�v,��9���%��*�F�c0(l��2@����8y�N1���
T��Y���!��D�b��;�L��;[�R�Iy
����Ez�U��27F�`�3i.��S���s89���F�{�L|MM��N��@��^���Yk���?:�EdC��Cd]+dp��D�T�
+����I�k��F�J|��xB�n���@�e���.�S�pP�ft�V��pr�1�1����(�t>��F�iTp�' �2�zGr���
�l�:����z��E���������(��i(�)0=h�c&x�.5��7�&!��*�`�',&,����6"Td��\j��S���x��A���b�` <���YY\�-�v�kK3X<cx[.��0�
_L����}a�XD5���������:���6%YY��ImR�Lc��6�yDz��AK����j��<0�'�������Y�E2�,���&�!|����v�q��9�4~�S��7�t���V�h'��uD`�S�j&�4NQ��q��K�����.��%�c��R4�J�TK�5:��i������F����}�F�<(C�"#�� ���<���"�g�rf�m���I{LCC�R\���g��$���o�`����L2��w�+e.�'�D?�W������1].|1�P��G�q�c^�N�4co�I�Y"e�����.������^]�c��>��z&�����/\�ZN���0��� �&��I���}%�0Y�r�x���D,�`�`,pf�*�j4l�U|N�I�r�v
�ir���v#�9p�~��M���	z���W������"���H�7���
�V�I�
�H5�� �mi�}�������T��]���;�h�0$��+������(4"
_V:����������a`�(YP�o"�
�l���K��Q7��&�����>���^�h�����Y2��k�?,���
���L;<q-\>�a�%������S�<���Q��q�6���1�iC$s��������-����G�:�'+�m���P�a�:A�X�P+�Px���S���\
?��>`��%{�v�(+j�y)o��'��&��K��Cn*����P���W���/J��}:��-��X��ol����O9@O(G{8^$��A5c���
����DN��!�����(�6��;B����!��V� q����C!&/\�1
m�M,����G�J���rM�KP���\I&�9+A�d���M��:����P��d��"v'0�X>Kt���i�@.�����9f��y�����|�D���n-���L���?}G���b��^5#J�������q�\�/��z��s�t������^a8��4G4C�u>��N�R�����w�9Fl��)]��#�Z2 I�D�0/_����y�����a�'L�����=PeW����2\Kq�����;29"j����	u�\�����$�=�;`�|v�5�?e��*q,�1�N�������4/A"DYp���h�+�xM0U~���Nx�����yDsC��")�4�Bn��W�9AM}�/EX>a���+svn2�B6�7&�>��&-���X�*�^�1�-�R�>j(���/,��:��F����ix����On�P�pp1|w�������zp�G��M_���yt;����I\\K�L�V�WTfz6��5C��M\_s��<\�'l	�"�C����d����aJ�����mC�adiBB�+RU���p�L�������#��8J�������L�y�����%
S��"_83egC�>L[(�# ��/�!���JM��o��e�p�\;}/�T����^�0jS�W���7t��"�������g�!U
9���`��^������N
����"��?��O�S�X��,��P��J|�`��q����K��`u0-����K����G����	b�������M3l3<���z`�����H���r�	_'��i4����X50���-hC��+��+��5cVN�L�H�3o��R����_�/���
.��3�~����^������f����������9/h������(
Xk�"A��H.���ev�oawf�I�����~�+�?5Y�C�(�]����2s�.0yZ~Y���4K����Q?�������'!�w���������
�xE����pI���mNZ���������Ep�3D�z�Z�x9Z�����K�����(�F�h�	(%��
�'����C^���w��G��������H�'x��$�� s��''8 �\�TG�{2�����V�#�:�]OV�g��������d�2��Il�A��3�ea
����6[�@�S���0���
B �s�!q���F��Dd\97�Uh�/Y��>E!i�/�I9SE�4�HCk(��	���4o{sy�4:��d���Fj�B�������~J���S��-��M�\5���[�����[�u1�W�1��$/�[�>��'p��X���=�{�{"��D���w����:�
�u*��hpC�	�3����~@0�����U��?������������H��������3���Kb����n/Xn\Gc��u����-�x:� Q^�O�7�k�:�e������\&����jb������Mb*�8wj�)^�B���������&��	S'��Xs�P��K�
�_��>c,N1k�//��2��y$]_W~��������|uzDV_98f04��w1g�p��mp�*��F��0�b��X���"G�wu�d[*�|���K�������r(7�
�@�pN�=��^
��2�d��S����Ab��W�>wC��O�����P��p:��1|_"��\���x|fA<k���lh31�	T�1�8�+�n�%3F��(����X!�Z���������z�����S��B����^XS���/'��$4��P?*�P���Dt���"$���������z!<���&�.yX���Z����&W:$W���Bo��+,�E?W�Cvy�T�u����D��J�@���g�b����4U�L�]k����8��Ny~�X��7S�����A~��8J0Y9���8�|�p�Gr�/�C�c�4���@d�@��@f�0�dq���F���;C�U�����v���Uno
���q1o_a�h�����3�*��z�Y��
���sq$��pj#N"�5�C/����k����,�p��%����c�:6�eV0#�%��5S��Xf��"c�u���Iy�4�A�J?y��*����&.���1U�a��C7��K�w4o�d������1]�����6ST���
��?�
3S}�ph� ����m�p#�W����e]�����Z��7ZK�St�">;�Pu��{%����1j�Hn���0^�"���������@Hc���1��o�6rC�s"�2���B������K� j>����Q��i4��q�;:�*���0T*��sF1L9m�/3���P3$������k�7ayA#��J�_�rW�O����W*
p�|��
����b�v!Bb���v�U2���A|��*`*��y�Uq`������j �	�Gfh
�+z�#���C��b���>mw`&�3������B��YM���&\��0��+��	��z��h�D'<������,.��C��{���0��v6V��`���Y9g�FW�Zf^*.&��������8G��pK �h�Sz�[����.��=\����1+��S*G*����>�I��H�����G�#��������R����CE���:�M��L�h���`�w^�0��X	z,�\�1X�M����x�����1]\
"�Z��
�$���y������0b������?��kf�n�����@��F��Bv�'������n_/���P@��R���	��T�S��ER�V��q-F��l��+Ib�P�.,�)�v4����?^�v)Q���Pq��Zl7de�|6�gV8������@ai+c��LS�e`�A�(��u�JD�������>��KI�&����P+
�c��1�y������w�]�,2�X9n��0ES��q����=��	�'�Z����\;��w2}K����7����lp:��:��>&����'�A��$B�[IM��F����u8������v5'�2�$������z�K��>���?Z"�N"2�GY{y��;y�Bf	�pg��ki�f�(O�ul@�(�W��:65����^�H��R
2���<�4�<�r�Pi��.>���9NV���!����g\�%J���	qj(�����*T���4���
t0�7��d���XE&N�����%�G**�a�hr�P��(v�{�tv��i���9[��OT�����oLD����W�#g�8o�q����s~0��SS�b��=m��J�=\rMm��%�}�s��X�����rn`�rX��W�`���O��jQ�~��$���.J�}�����]Q��h�\�QDqM�r�nn��[��*�s�uj4�g��4^Q�K�V6���K�#~�}������T)
��Hb����j��[�_�4���B�����|�������� ������s#�
56�F,A����g��b���}������d`}V�X�ToM�:�������K�%��l�e�-~���&�/���q�e�����5��n��;�����8H����M��������+ W����"���hE�o$��-��7}������u�PhHJ�k��}�3�&���t���6]My���Y�Q����]�~��������%���5e�Y-��g�����k���W�������+�_�+�����'�����:���m�Z�)%c7��K����B���9����d�����vQG[E�*A&^����(p�b  Ua�b�-�qO�"�Os%LS��Wu��M����_V�
m� �c��+�n���~��
��u�|������c�5!��U�GP��������u�>�^��XQ���i���/V1��n���B
�	����5�fx98��Y��T�%JG��^���#���Y���n+4��]�C��.k��cH>�
�P1�>�-�-�)�15��G��x�D�RI
��lo�~��
������QTX�nU6���^U�oU��x���xwN����V�����������\������~���'#����6�n����I�k�.�x��%��hz�%�|14��J�a�'B���\��wh��}���#��G��Xc� ��%�A�:-Hi#�Q���d���*��N{1���������An��-��{�����9�Qs�h���]���7��E��n5� ���,.�{/���k�	Nb�$J�� T���11�I���a�.�nQE���3n��z@�����z�+���&��3X�(��d~�Iw�}�c6���Z���km�Be�����9y0��M�^b\��� ���\�������~����;��q�t����&4r��.����Z..��_C%�i�����K��^�����s�<7�E���>w
-Ff3��W��D���U����������(M��U���Bs��R�r��1�0v#��:��,��U����i
��E�8�M�ms��B�d;�$W.�ia�dbM-_4���{��l��a	'�B������8�o�2�������/mn�;����N�����;t�5�wn/9�wn;F�:�'�d�
aa
oTf�%z#��Q�A��|J\o|�H�$Uro$�sB���#�zP^y.Hs��x[Y7`�
��w�����2�����#|��w�[�Z����>o�oq�C������D��J�����ue0 �E>�#�,HD�rUR?&�+��-%�t�8�#
�1�aB������H-
a���j�>��dE��|M���E������y�h�'�=,�an��ZA.v��Q=.z�<��m�,g�S���;g����d����4m�KB�h�L����������C���:
�g0N�2X�����1#�#M�l�}%:ov
�W�����/�h����
4gU�S�y��0N����k�E�+��CC��/�1(|2�T���o�j�g�'�1����F�EW���C�
)9d����\��M��Y�Zp,����z[�>���b�1b������f�1`kOC:ee��P��4(�������(;x$�����������Y�����o���������)
���]�2.��^�������{m&�n�s/_�I��G�sV��B��O_�^����������{K�Z7���S~����%�k�\�{M�O�B�4l�,��u���5Eg��R��5h�Y���G���������3��������W�
�GZ,[�o��X�B��Y��D�m�q�tkI^���d��\[X����k��r�H�L���"�t�+�t:�7���a�O4DP��7>�m���X�1����Z��gr��]F��3�"���9{��l����Qks@��_��#��"%> ��3&����vz��t�����?,	����O��w���0\pQ��&��*0������qN�~�`c4�\���;< [l�)j\^R"fH�����<
n	�B ���8��Q�?,�~%���,������0O���@hZ��]G*�N����ZM���s���p��h!h+�����q�������t��Oh���$��]H������l����������NE���`��%�1u������	,� h&�C�\y<��������e��������ZE5���OeYv������aZ��n��k��-F^|�h�f��B��.��m���t�����0�\S�����"d#�e�-6���E��]��^|��?�wJVGd��1����a��x�h��"��������}������bx���^�����f��������8{*I-c�s����Kv�����.��.O�$v��"��k�+��o�����1����RqzLiC�Li�$��y2�L���K�M���]�B�y��&2J�Q��_������B"������4�o���F�u����S������dZp����_N�:K�����J�esj�u�5��dI#�kI�~�3��������t��	�X��Q�������H4EO�)���S���Y;��C����bFO���{�7E�1t����!���RR�5�~ ����d���m�6�M���"�[<(���n�c��q8N��o�;�M�����y��!�R�_]�*6o��>R�V�M��V��9��T�[��ys��_C��j�>��\�������ys���'A���R�Ak�FC�����M�\\��n������P�����59�H�\�Y�� �iDq��2���� R�:1��R����P�Mb���<�7hoK
���.���S��)v���*�AB-���*�tryXS�����"U{Ee}������4>W�����D�^���/#,,�9��iG�:����w�����z���:�O?��$b�e�B��;�1`���X�����sD������M�jM����S��jp~|�*�Q���=������7����l��Kw�>�� _{Q���VR }3�C���w&6d��o��=.�@��q�#5j���\�Kp���27�-�_7�l��H��Y!�VL^����ls�����w����6)���SQ;�Y����\_�c��:���?�"���_�3�lG�\������&��'���Ek��������X���i��C�IQ���_��6���+9�^U����?��^�q�����Y���q����Q<�x��#�������W�� �@��.@���:�,EZ�������p������S�^������,es�������S��Zk�0u��NF��_���?�2�fE��?�!-���`�Z{��E��a_a�@u�c�
Wwa'�����n��%HO�YG\��20��z0q�L���,�q�����2��e�%����s��q�����&g��r����L�D�5�h4�����'����[�,�x"��>��b9E%�WyX7];~f�����	O2��~+��xt�J~�j����Wr�#�7�2��^:v������O����W u5u�o��|�K����q�N�A���w �I�����~7��j�[��WN���>�t���No���J5��d�~V(+��b������E�\��t`K�}��pZS���8����_���g@��WPe�x���#��.�N�O#�cX�������~'MO�j9R7+_]�U���>�8G�������7����z�V�rS��n��*�u�9<���^��=:�SEH)
� �j�.�u9#r�7�X(U��K0Vw�")5�����6,�,�G�iX
�����2�{���%Be\,�!?��s���x�{�o���4�@�S�d	�M-B0���������g�[S�!�,�x�`rs���Pr��@�b��fQ�f\��Ztd��`ID��sQN�C^��/���|*Wt���n"��������}e2>�e����"�8rVEBDL,��������6'.�m"n��M���/��a�����"����Tk>E��{�=����E�W@|�,�@MQ���������o��fS������?��a��������FV��p�V�OG�n����^��,����������r:����X��`��6n�b`���t
����Y�X�z�'M�)��h�<�h�����$W���Hp���p���Hf�;���>����=*nF���<�g�RhR�+������uCM]��z����o��C�NKU���S,=���t��A�'@ID���`D*�pz/s�>,�����S���1����3l�0��)�����[d��p�@e���q�����w�y�*�yOu�?�b*M?��I>o$
c�3����?�9;h�L�pW:��3�q6�W�.o�xm�z�����I���l+�l:�>���l�m4����Y�Zs�-�Gw����:F��(1�u�����0H����Cw��p#�^��*(���I�����F��<~�%��P[�7j��5�����l/����<����}~u(����)��q��.�M����������&��Q�j#	
�k�k�K|�k������"�^�,6y�,�TMgC�T7�`��`���R��2�����;MQ_�����N"$k�O?"��z�|�P�|Gq$%J��T��H^�vDh��V9m
��f9{����:� \�����i�c�U��E��g����<'������������FJ'�<b�5S�
mu�{����(��rV����(�����fX.f�WeC"����4��G�����5��t����B_���Y#B�^YwS��*U�QIS���r�lZk����z�x'i�X�����X� 
�c<#4}����{U�s1�BsGH�*VM�!:�����9�3�m{%V�7���.`�T������I��b�*'4p���B7��T�-�@�W��?
.N�\��������n�1�KD|5�8g�U�&n�' GE�)eXUk��2gT���U6".5�a����b�N��������oT6���:c��0	��J(���	nE�]�g�!+�.�<$����h�44��e<7�����k�0����D��>�\o1M9���k(&IbT����'���6����$s�����"`��X��n�/���E�F��9����]1�/����=��L�_
.�s%�/<�2�>L��/���u����E�ir�M����7�1��+��wV���hC�����A�"���AX�_HW�8�����5�[q�fY)����]���n�L�!�w������j0aLr�Rt�{>�:��Es^���V�`'�
[�"����b$]5p	�����PF��
u�i���{BxmE|�,��c�����.$��,I���{G��H�&��`�U2��^u�04<��X�����$YTK�Gs%����Q�}�2^,_��N���v��F��o��'9������eM3
L"83��A
�LC����"~4|O�O���
�D5���7�z�w����i�6�MI�����S�f#,y�tn1>5��������L^��f�i ������lH��6������9��3!���f�qJ��!��@������9�b,���	hK��ux|���C*0����p	��g������}LW4,��ht�U8��K��=G�G��o�������(=bO��k2G���>��1@,�	����������������4��tw&��~H�bk���d<`��;^��f@e~��)8�D�l5�G�#db�4��d�O�����"��
���.�bQ����T�LC�G>��@���w���^v*�W�A���%�_�I�{���;����A8�����;W�ST;�M���k��mxc�G�����HW��V&������Y�E�?����%F���(G�j%_s0��Xs�t�%����0�|>���Tt����m8��y�� �y����uD�T��
������D{��OAVq=�o+\G"�;f�ri��Y�����G�I��-�)�4��������D�r�,�d�|�J������NG<��
C�C�_�^3�,6��N�P�C$N}?2�e���;�4R��\���@�R��3L��
y�z8��B���x���
��7J�n~ ��&�HV�}�M��/�~eM�D�_j,T(���-P���s�j��
X��&j�<�]��l�W���ncY�J�x�n�B�79C�@t����6�������I!6e�cv�=$1nL��m��x#�AL�2�<)���|���+��zfH�K��N����y��7X��?`����&-<�6Y��g�.�j��F��"��W�gK/���YS�%������`*�����$����d��^�M�6/��w�����l"���������T^�t��
����\+����a��Il��z������2(���+������Gu������*���-
F��|_��f�f	%�n[V�-�Js�m�y��K���9�1��(}�������I*�C�
��w��������1$�C�/,h�m�����k�ZOF�I������g
d.�������!O��V��M,=����|'�e��s2��7�'���c��q�_����
�^�n��u��8�_��nf�6�-d�k��s&S�\��@������\������N����c�!�Z���(����#�Vt^��`��.H�����
��zN�I)�=Dbh9F!�o�J�N'����@����"��t��%{�B�g?1�BJy���q
H���q�b#v��^54r|�����?������
�t����N��_~�0�~�z����-���a�z�^�Q��m���x`5����Uk��N��j�0E����h_u�D[u��j��g�f)���(X����\�2&��o���8O�B�l���A��&Gi%��/A�5]�� ��b��`���F+��!k���h������;�(��0���f�����m�L��E�A�����%F#�jn����DV�����F�)7�>�J�q��r���k�zlw�n�b��Be%�e%IF�&�{`�qR��	��w.�c�N�Au-edHry�
-`�Ia>�N��<�-Q�6�	
��Y7�����6z-%��.�I��������JF}�:�Mt�����t�����tG�5��r;)t\%����K�Q"�N�V��/��Op�S���{��:����)<b�Q&�� ���O�p������z'E����s�7�������^@��SELx��Cw�#�)�%^/a@��0�GtR�Z����L��@��������l���E�������q���cv�K ���R����;�	�E3eD��)����!�1	�H�glO�� �+4���|c�,��_���@��7�~���o5[����Y����Y�	q'r`�v,FK	b�K�5���_>����&Xv~����A}��|�ic!,�Y�?d��������S��Q��#}�^L��|I���z���|}��f
�e��KwN?F��YPFp��I�O���-�PE�p,��������z�������8d�
�����G��	Q�0������[�����.4���N"���K2�P�Icx�B�W���J��D`io}�����b�JH`c�t��sD�}����rt��j��U?�����&�M��C���_b�=����N����E�K,9Q"~��C�-�3�����)��vme��@	�?Ym9�u�P#M��6�����0���>*�T�'���;��,�x�*�;�[���Ny��_`�;t`������k�9��c�e�PR������{�4[������	eQ�x<;�M@��E����dDh����YH����op��FB�v@M;h4@[s������b9!nC������^�"�==k8D�KBa�X�X�
���]�t���c��L�����~���"��A�e�v#�RB"(~$U��@q	2��H�GG����2��@��d<�I"F�i�I\�~����cJv��UT�~��X���>������XX�nM��ts�)	v��(����`�����z����="����puXp/����>Y�84�
4Z��
@2�o��+g����cD������~�rv�n�
�&�y���o��]Y�.�c�L�v*J�,��|��e>&Zt%k�p%�'��#�/�XZ]A�}�t�V�C���p
G��q�����OD����FH���)<���)nS��D�IA/|+��W���#s �w�F��
���U���5��������\��9Z��J�������ZN_�4��g��a���4Y{[�8�;�F�g@�oa�ro`Z.�p��$���a�]>�s�������Yt?�)DQ��9'������f��4�lt�^���5���A��=��j,�������/��x�����$���
R!�L�OE����(G�Xr���+Z�ktDl8��%�rd<������OP�������u,�6����K��V�+����������t�3��|��5H��?_J��4�	��P�b��C�;P����C�c��Y�#����������7b�7OW���^�}��{����k��5Q8�G��%��rm��9�����K�/����>��w��]�=H��o��a�
�e�|X�a.#A-�A
^��A.���y38�/n���?����0�8}	����i�S��@C��6\]z2=Se��?��������9��B�������!^�69N/��wX_�L�T���}tg�O7p�������Rj����n���h
V
�0���]P���v�`��d��b7c���C,FS�?��w���xZ�7�U����r���mT2_��Q�� ���zR�E]*y��:2�He���eGk�G� �g,��
����%TeM}��)No�����z'����O(
z_��Z����a� Ok�4Z��_�5Qc*�.����[�u�����hR������C��h�{��wwn'k����r��-�~�7��k��"�"���{X�d�ms�5Q�J�w���A�$���Dg�Ue�e�n���C>�\ln=~�r�/	R�_d �����M���X��,���������73�`s�B_����\�n[�L�)�Z����[���c��3����AB�s����R����e�m�^!� k��70�y�x6��}�����lP�%+�����+���3��
5q5TTlw������b���N�B%,��.�>h����mK�9 � "9b�(Q�)������x���'bd=����|���5	C��)���a��
��8�bb�uR���Y�41��t�#�>������D��?�O�DweT2n�T]l��	3��ln����]����a5N�s�a	�o���G��8m]��1����6o���o��	����UWqG������@M�{M��x�g��XO���L�����\�6K��
$1�����(���=��o����%��$+}���F�O�1��r�x��9��8'?� *�����}��������2���o&4L���}�eX��a������fC����V_�
�������,����L�V�]��^�2��'/�����Y�X����R���=k�Q�j0��8uC��"?�������'��&0N�" d1;��m�c8m��/��Wo+C_�?���9:�n�7�Fm�v�9���|��N�d@�2ty��Pf�/���B5l��k'^�D��c��Et��a���R�d1��wc�U����r����W�.�1�dNA��uy�.�j1���`�	i��p��E$�&�����&�i������������`_���>������5M*&��p�'4MS�
����s�w*k$�m�5&�5e�bu��|�'�x�E���J�0��3w1r���T�b�����;7������?��ux(�5��2')����5����g����*�����k���5���I��
�^0�	;'�
�B��,�z����������k4�n�k��X������Z����P�v&J^�KH����yg��Q�%.`p�|��fq��po��F�;�H	����H��Cr���RI���d�+�������������`n8�2h^`l���<��b����������x���^58�}J�7!���
����(���#�-�b*O�N�t�����s���o����Ta����"�;��bM����b���H���QI��&��-G�����	�.���X�o�	?�����6<Zzz�Q�������i �u5�uv}�����f,�����o>d�.PB(�Q���_�kr��[��+=���8Q0�7g01�^Y�����5��k�S���n����;B �v���D7��j_L6��N������Cj�$�NL�(�>]�@d���j��#����R:�&��n�����6'4tcR�
�����rD�L4�@��h���N#�����+���
��!M�?�-�E�����@F|>n�9����w�P�,��<VF��G^���0�!���
�|cR���I��Z;��4�!�
N�O�(��e���E�Y�Vd���3X[N�s���8������lc�h�}q�#��4����4_r����E�E�S��2���������b�q������k��(Z1_G�����>�m
H���)�v��[�*���L�Nu�LN�2���8'������_�������	u_��'�L�pI�����9��p&���0�(4C�*$����1�?������?"Y�b�D��d��������d-:6��z�]1��n�J�u�c���Y�u*�n}�m����U�����~^�H�TmH,��.:�M]��N����xm.�������Ubs��nH��a�0������X�h9�����BW���`����O}�O��n�TK���)�H��=FX��)��:F�qy�N./��
Nn8�+�IY�B_xc�Ma�M-8�&!�"�oPb�dw��J��E�����'��8mg�������!�����������N�:&p_'����f�o�-��+A4U����z���X�wcW�7�~�����6)T������.����Dj%)!�f��}�l]��eZL�;u<3&�%�;�U�<�7�W���M�������r<��&a���[�8�t���y���7\��14����9Oo���_�'��[����|A��x�/X&R�H���N�!�r:glS�H}�?�K�h�g�x%���v{
��G�����57�Is��N��,��T�y����������Be-Q����8�7X�$���bM��$�U��WY����N���my\{�7
�����\Gf
��N���d�`�������5������������O�W�!���9��NU*�7m���{M��o�5����s���41���	�{,n����y�|9���&l���I�@��QYa�m�Q.�X+j��u�M�����G��^�S�����C%�F[4=�7�w�n��h:���m�A�du���q�;$�����k[wAZ:d+���.���A�a>�q�!��D�Q�M4�\��`"��'28� ��|1�����i��<��L��P��;��?/�}N�����9����c`��4���w�kv�� ������c��|���z����B*
��U)AI�C!�N[���M)�h�������I�U���4��U���&zr��I2{<=�2'����Uw�sD���VE�b^hf��w�q�����~{����Z����VCBR����!x�4
T���7���{���d��t��o��O���{vA��K���s��9m�c@�����[����+9�9	��S�OY��L<1����lt��`>B�3V�"*\�����8,IS�N���
�
�{��������1��^(u�����.����D�bpD�a���I���.�����"�����"j�Hp`VtKH�N�)���K8I�������qzRt���#v��Y�&��6t�W���By�U&����8��Q�s!�
�e\d�������A�s������JD�����I�
�������	�Z��\'�a5�n�D��7��$��`&k�=�T�R���q^q���'���s:�q�4!p@,��}�[M���)������ @N��G����L��G�V�#D��c�#|�#���������H��(�Q��a��Z��j{�seQ;i��V����f�k������^�q��u���L�$��iw�n�5����Az����;�jP�s\d�h�w��K|P�(�5�C�OF�_�(�I�t�%M�����d��/9�wP
��G������:<��%�������v�
���`�)_�Y�w#�t�"0���R��,a�L)#����^�����/�	�e4����f�N�
F_j���{��zaBg���/�_�O
fp�+T�?��N�<�7r#�������I]����0x�����|�;���8�5��� V�,?_J��I��+���GG\����o[��/;�/
������%
�!j���G�6@�W{�bLBB6"��z�������g)U+R�c�������u��lO[�����K��V9O!��	9��I^� w���PB�����AV��IN+.�HV�UBdm�����-k�
�eD��s]����������)D�^��*�r�N��kK�Z�h�s���-���O?��l
��L�c,%|$����$3H�s��W!w��{��_��dHG�J�#�1HK������F���bM.�61�O�Q��O
m���f6�N`�#}�:��������:���0��#b��I	�F�;�'�hi�l�C0��"yQ�1U3 � ��cQ�[?��������/h�1����.���M�\$&�(�
B�G+���f��B�\��<��K���F��FFE8����X�yeE+5FK�"J��NY*[�l��%�����$Mu����E��g#�Ms�4�^�IkKu
����1�g1��m(�;bFL�|�I���j��`��Y��X
^2�6���_�;��~/F����z~��OZ�N�gq�=��
Ws�p9�;i�j�UX�e��8��S�P�A����D'��c�) �	�'�Fd
�s%h<�N�4��=�/����p:��F+��{yAB���"�8[�mT�XA|M*�d�q��������zA��	��%��k��l��������cg2������EvgsQ�"Wu�z��X�Z��)~
�"����t��4��|�Sm������=��.Y���^�x|�){�|��i��p]��#�ZN�!�!f��=r���]M[�Rk��Z���w�v���h���b;��g�	'Nv����88�+F[�\�<�����T#{�q���6�a��J<��=���U�qZf^�s��7�����lY���;8��fV|��MOO�r��������G����8��[��~����e~[>f9�_�_N��]>/�P��).C�~���u9�
�-!���\�o�:���;�F9\o�~�������K�"�>�:���r=�C�H�������k���!u������S��jp~|����_T����qK[�������Xn���H��
���9�:v��j)7j�T�Vc��j*��I���}�^��\����@d������A��O�L��#����g���{����������l�C����zOM���Se--^\�^�����A�������������s�L�����6�����suyv����O���G��NBH�;�y��s�#qH����v�XP2*YV�����\�w�_�
N���0��M���V��R&�K�Z�<�GJy@�$V������f���?!���zy���v��9RR��r���P��5}���d��G{�
��C���Q^�;�EH��{��-VZ���n9rT��+���)�J��|c��U�������p������>��:����I���������p��zM���o�G9%�����z+~�����C����S��jH&�"@����'�������{��Ao�N�e�oG�0Hg/^�i��5�$o�c�DIy?�KKmc���=FeGJy�t��`��/:����m8?{���������������;O.���E�h���{������A���[�v��9"6�u�� �z������Q
���[T���DS�����-1JE?Xp�Q��+�{��������6���7���Y�;��_g���i/Y<��6s��s�;�����P���L'A��>�'��dvj$���}������i�*X���U�a���8�t�=Ix��Y�f
�Ktvq����"CA�7XVh�����K
9=��D���fd�1Y��.�_�����,���)1;�;���-l]����:� ��R�\�`?��O�c��v��k_*�����>�'���|$��"#@�Q���������X2i��Cw��^�oZ-�5�J�tY�(#�"�@��-D,�]���2���������R��2nUJ�V���yV4��sB�v9��s������*����KauZ�DzC��67�Tr��v���f�������;��|$����M��m������=����EB�I�|S�����'�����?�t��W��z)8N�.�� ��g�t�@v~G~z�&�v�����W���t�r:��V��nw�_��]�+��
���Y���W���v������C��������|ZS���8D���_���g���k��+{���]�Zz�w�|���������Q�.��w��n�#u�B���r@;i�:G��j5[Uo�7�{'���z��y���GN��=����UN�h��������K��=V���8���i��f�u���b����q�9�{�19���{O1z����/��O�5Ml�DQ�5
<�Q�	�jQOqE���Ex�[��K�@��`��[)�����yz�q��h\Nn��2��M��wj�)a~����%���u���M�p�)�~3.<�RW�`q[WR��IA�Rw+t��������)��7A���)$yA��48U��'&w�n��]��NXtNtBmV�)��nE���4C)o�^�?7�#�<��^��'�(k�?��J�z�>9�8��	�v��,��$��i�2%JsJ��t'�v��6	���B :�<ty��s)�F�%+�Zs���	����:�Sw1��2�����c������l�X�`!�������'�g�'h�A���.����X�:v���������{*k����/O����+�?��K������n;���������/������[�����~�������_�q�����&����������|���HXn���FC�P��P�X-IA��j��%�wx�X����[>�u}K�'�[�L�T^�<���?�����g�?�J*�����T5�S0�������������]�nr;�3j���Y���>RM?�[p2����,�y��;^��g/O��wk��,/�]���iu��2���b����>>�h4�����9����
����$����<\h����MNF�(Y��,z�l�^����Er_���9�)Q!��*(f�=�U���^q����7���N��t�`�hx���Q��[k���p�i�N�v��N+�7r�Hn�����d�v�Ea�{\����5?�ld����!���Z������������B�/����g7|�}��
���+���(����^�����RL� ������F
�G}�f�����A��W]�u���ET��9�
���_&l��A����8�h��0].P^���pAOB'R�7=����y�������*���G��0�	Ft��>_�zwdh8�3",�j�,�G��C�����"�[����GJA����:���3*XP�4u��z�cUK1F��E�&����d���O���x|F�[{�"����Q%��L���T�&r������o4&��qs2���N��@�����ZN����D6�;D��B�1757g%�u�hr�T�h;B��@���@�C�
]@ZM������Oq�A���I@�?��prF�9�����<J���q�uD��	�������\�3m�,���d6���#q,|�e�����~(�:
@/����g
�C����4@�>��3	%!�
N���=��`�##�b��*�?�NB.5w�)<�]<VL"�t�O�[30�OO��,���9�p��,�1�-���O~�/�^�����0�*����������eT�������)��*2��&u\t�,6��Bz��r�����@�#�f����}��4��H��s7��D��5��1��=4N�5����r_���NC��
-��:\��\5�7g�Rr��%����`?��+|	��
-��:��|�~mw���:��P�.qu�Q<J����x
�>���s!Q��8���-���%}O��v�����w�������?�b`�)�z;==��3����E����,	W���A�e���.�R�h�����#`&���Nr�
mo�I��Le�pZ�b(C���V�����wL= ��� ~`_����p�=���_��~���A�@��Q5�����&+P��������L�EN@^���vP��*h��_�&��RQ�k7���a���@�<M��U3W������AE���x#6�!�����.���^��-����0��ta2'��K@��C�>�N@29�B~ ��MA����������2�0A�l����%�	�����y�e0���g�
{������h����i��W,�"5���g�����38�LX ��`d��k��\>�D�0��
>�8����J�S	�A Z�*��H�$J���w1q[Z?�E��u`OV(�F��P�1�:�X�h��2��/��>h^�Rd�6��
,���GYQ�`�Ky��\����ibpB���A�l��N'����$�T9H>�3}����?��jM���#!0��IlI�|�xm��7L�	r"��K���y�i@���jYp�#�x>���nE�wg��;�a�����&���:"8_��h�����x;4�x+�]WC2�9+A�d���M��,����P��$��"��U8dCm.fZ�����.���#�ypCO������6�!��@h�����:A��w���*Fq�U3�T�(�1���%�D���2�)�ma��N����6~�F��aLs�/$]�#��d�����9iE��cG`��8#`u<��%��N�������7�+JN��\���J�U��/�5,O�VdP8��L������6JB�/W4�/t�	rO�X=�]�Z�OY�������`��vv$!p!��K�Q�8.]��
��`��`G;�]��r���
%7k��T�`L��X�-^��L5,a*��	[^��p�9���11� lmr���s���a<,`YD��+2.��)�����\�3��/_��������f8����_���~p=������G*u�M��3w�1�4������>�/]�De�w
����g�P�]���nv	�	[�4�3Y��3���`u{�h	%I�T~���c�0�4!!R��*h�Q�J*L|���g��J���A�
]�L��tc�		ZR����F#pG�/��������-s���p����J��&������$2���|s/�T����^�0jS4D�W����V�1�o&��>�=C
)@d�����sz��6rJj;5��[�#���>�4w?	2icI�� ,hC��*�-b���Q���s,�"C>�:��|q��4Yp��`Zs�F%{"���*�5�����e!,������	_'��i4���S$����
�[�\�>�����Y9�M<��)?���z��,
�d/�����?D{�r���gx���n~��Z����Vl]�I����&A+<��RH|Z�a�0vg����������b�SC��=$�r����>.3'�����/\O���^���*�w�)GL�c����T�4y�^���W4��������UZ�	j.�	�?P9C�G����k�HX���\i�bnt�����L�
����{�_�`8�%XZqp�||�/ ��q6��$�p��5�MK2�
W��qr���k�s��'�����h8B��[�	}?���>�}q�_�() "�
���'�����l������l�!oLi����*f�n�����Q�����q�P�V���d-������0&eQ�B�n"[��Y�q5b
����Bf�	�����<J��{2RWc#�e!���E�Bm?�be��E�'��u����
d����bo����w�+���U��-�[���8�I�k���=�=�:"X�;�z�w�F����U�2������j�C? ����kq�_��G�c�&��
���C y��/n�w��j�..���.�n��@���K�\����~��N<Z�(�n�'W���5H������
G�]��Rv�w��=�4l�i��
4���~�W�P�E��1(tnt��@��E?y��	�_���;�
������g��rS������u`I����u��d��h&_���W����]��(�-�ax����4���>L���(Vp������])��
*����ty����a��M�-�~��D8'���]/���C2s�)���� ��?�+~���\���p�XN��@�?�]~�>�/�.�^�~<>� ����i6�������D���?~����]G�l��[��y-VW�K����k�P�`��n��O���qyN/��k���dyaw�q�YSZ"��B������i�m���Y{D���<�
,��G-��a^�+�z�t��T������!	�0�{*��xwv|�MX�w �]���������g�*P&����	��_��r�<?_,���)EmL��T�p��P�����B����n�H���vH�c=���@@c� ��@f62�dq���F���;C�U�����v���Uno����t1�_a�h�����3��z�Y��
���sq$��pjK"�5��C/,���k���Q�,�p��%����c@:6�eVK#��$�5S��Xf��"�c�%���Iy�4�A�J?y��*����&.���1U�a��C7��K�w4o�d������1vg�q����1��z��6P0g���M2�1��N
#uss�6�t����
����e�M
��R"����):d�E�(��$X�0�8F-�Mxp�2`W������Nl}�9FU#4�Fn(xNdPf|S�T2V77|)D��������n�1>��F���P�d'��R�f�����j�0w�l
��nY�t�z���E��2���� �e���d0�r�sXs��f���4$������k�7��F�cx��6:�|]�Nk�T���w��O�Z2D��8t�q���d�����Ac�,%������1�yH�����`�&��@d&�V���G<B98��+�jZ��f�?c���Y�+4���?��n���^�o�7�B����7y��+�N����N���H������<����
����M(og#���h���sfll\�-��������x����~2]'Vh*� Z����� 6�3����u�
�D\l�1�8��q}�`��������pC:��#KyTxuy�>��j(qL�a_�����)�k����F���c%xk���c���`��6��S���:>[w���Y����lP%i�YO�k�3X>������ �>������`��h��f�8k)�T���Si����W��xu�h���#1I�+����|����-\FKX0)[/��J���T�&��u��j)����W��I
/&���\���ld�=������~!�rpP����4�y��u�;J`�@��r!v��"A�"���M��/�����E�D��U)��������w��T�S�1\�����XVD&�Z����������-)}+����������������t����=����o%%U���H����1^2fzj2v������k9�S>��%/5���gO"��w�I�=��{��I��2K(I���q�B�\%09���l�@S!(N�/��:6�������H��Rl2����<y,��
����w/"�d	_�Jc���d$mH�+(_��1=���heBw�������
��q?M������(���&B�%V��������.�j������]�FS*��<Fq�����[���!���l��H*)c���'��w(�7��4Z����G�!(�t�����)F�L��6�j��	���6���>��9Ww���{�\9��g�+���Qel�"�V�
��v!��nuQ���'�x:W�B�	i�\�QDqM�r�nn���z8��GT9����r1�W�����S����X~��~���wd�C��Hb�������[�_�4���������|�������A"e'u�F�BmJ�X���+g����3z:�!)������1D����_��ul��?3d���nK�i���[�: 1��M�_�M5���W5)+k1�nCw,��#xq�.sa��OC��-IW@�����EP�k���H0"�[�o�XC�m�����@�NC��Vb��=���h���?�8�f���Y�X���X��J��hD�j����jY�.l��)&�W��RX���������sn��j	���/�
�uoC��NIP�L1���~�����ua��s�ek1k��si��A�xF�r3�`d��V��J����{����
H���HOX�iePL��\	�U�E�g���%����B��XR���D���_9��}$�:��D����� O-h����tk6��;�k�������`1]���7��uA���*&�!�=�/^�P�5a��>�c���/��?�V�*�D�(P��c4tb���4���C�m������h�2�e
UwL5�U!�*F���5M�E�1%:������o���P
�!���-�/]S�9[9^3�v���j�fU�����������	��6�j��#���Q��x�\��oT�Y�d�����R��C`��r4)�m�eo{�(^MO���/���[������4�1�h��}���#��G��Xc� ��%��:�Ii#�Q2�����*��N{g���������An��-x��{���h�4�[�F���'�������~�e�{������������`����44L�+�0�.���o�����n�8���_�+
�����+J�dW����<ZMg�Qn����6���t�z�m�5Z����&�W6�y��������%�+�����e��<Y a��'��K���y�K7~��m`���������R��5T��I�����/�Em��,9��sS^D1��=�s���bd1cZ�q�N���Xe�H��*�d�������S��U���BsL�R����1�0&#���:��,��U����i
��E�8�M�ms��B�d;�$W.�ia�dbM-_4���{��l��a	'�B������8�o���������emn#�����Q�o4�~���zk.��^r���v��%�p��B�8�Q���$����F���)q�a�A> Y��T�����	�?���
@Yx
�� ��J��me��4`h-|������E��#�s��=�������V��}�"���(���^�
���,!���Y��\��p:b,�DD�4!W%�cb�b��R�$��s�0�%�&�,|��@� �Mjk�V�|MvP�[���|�Z4����]����|rMqI����k��E#DG%����X�����MOA�_��)Wv�U�v���Q/	��2M�6�c�Sgs�rX�]�z�4(�������V#�/�L+��e|�%��T�l�)�^Y�{"$�D�RMIh�n�����v��{�L�qb��#�0�\a��& �d�A��y��o���dO�c�E3�4<�.�i���
�	l�H.e�&�>n�,]-8���b�-B���c����`Ts�|3�����!����i��v�n^���XP�<�������|pc�,N_�y�7S�bi_�S��%�m��L���d��qa�D���&�7���S7�"c�&�T��`!�
�'�/�E��WTtM��K+x���x��r�)Z�k���5_.����'Y!^6r��:�����B�3}H���AQ���H>�{���:���F�������2��a�|e�p!T�T����2�08cT���-� ��n-�+��q��9�k����c-]Y�)
�p[�%�n����od�Y���h��b�o|*U���*c�O�����?��z�5��B7�g�E�1s�����	\������l���G6	�EJ|@��gL>	3���t�v:��n�)`@�KB�4����Z��o�p%��\���	v�
�s�`�xc��#��%��y�����mJI��dK���3_1y���@$���q����X��J�1%	�D0rk�+a�(u!�P�0F��T�� �j�ZM�*���s���p��h!h+�����q�����a
u��O(���$��]H��P���l�����/���PUE����`��%�15�������	,�
� +h&�C�\y<��:�����e��������ZE5���OeYv������aZ��n��k��-F^|�h�f��B��.��c�a������0�\!�����k#�e�-6���E��]���|��?�wJVGd��1����a��(�h����z�����F����{�(+�<��
�J�A��U��l���O%��c,t.�4p��baCz��"�Gq]6��}FarC�T/�!��S�H���&��
scJ��1�
2��dJ[���2�2[.�6��[v�I��k>��(AF��q|>|�w��sd
����VNR�9:���������'�<_x�L��m0�������������g���k�m��,������|�:1�Q���o8�)g9���Eo��=8j����QSD�tl����@������<�M��)^������xS���zI#�H4�Bbh�������\�Xf{�zB����xP�f����~��p�N��S��	]V���a<���P�u���s$ ��"�G���������G����~�Jo.��k��j\��'���=���:2�Bo����$��5_�?h���h|�K������
.��W7jp�f�J����u�G����kX�mD=
�(��W��D�|'F�Llli��C�e�2�o��D(���A�[jH��v���t_�*�������TY^'����Tx���a}�?�^�$Kp�-��Ob�^v�]����H{�r��@���S��#i�����;�n�1�[��<����>�0���tjN���>�1���X�����sDu�����M�jM����S��jp~|�*VR���=������7��*Ym��KU��>�� _{Q���f�Z�j�wY[�W�����c�ao�����������s�v�=�7G��t�p��f���i�9�"������:w�m�s?rt�r"3s�"�Opv� j�t"+��P����c��Vg��gT�U��rz&����Xh����I��I1��"����a��''��k)�t~y����rRT)�����Mw��:��W���������E���w��fivs�����G�ptZh���v�$�R��R�^�kh����&���iv��c�"_�A�����OYt{��c�w�����}�k�������r[k}�f��3v��($����U�!��YQ��O�EH�=��>�D�V������.GX�����f������_�n��l�Y����'��#������>�>Q�&�O\t|��8O�N�bD�2��Yt�9|�8�Ktd�3�T�J9lWJ_bw���r	�n�Os�����-��J<���
�j��"�y�<�
��?�j��~�����{a��	<:|%�i�����+����F�s/;{_����Gs��]�����~7���B��������3t�
��-��� �f;���o�i�z��WN����8�V��U��t��_��o:s�Y����W���v�����ri�9���f��5���/�<�6������l:������wAxD���t|��3���\�s�����A���/�������������P�}�8G�����&�E5���z%w~�w������g�vF��l���������W���A�v�����N{��zsz������;^!���
��xN!���.stb�sB7�����W����<�
��;*�hwV�� ��/h�$g+�I>��y�h��SkP�G��t�3)a,��A^k�x�xX��QP	���+��xs �c����5��bv]#��1\iK��������I_�#8��U�J �V�BU{;����L^�h:�"�M5X�U(��@���V�V.q7��������������xj�8�.���y����;�a���/u���,(A'Q� ��S*���0`Y����/R���\)�I�3�ZS�5�F>K;N]Q][����L*=��0k�MC�J�g�He���R7:�Q�
�4�x	��E3�Ni��	�+����F��v�()�K��:��+p��a,L�'��%�������k���lCk�k��hS�0�����T
^�.���4�����XVd�>Rw��}t��%��60���_�������r���=������P\	4/l���P-����?YC���v9��AJuTuO�8�=��sZm��y�(WAj;tL�@T�W�����^k5���5wF��Fo������/��dM����g��S�/�v8 ��R����jDl�|9�O;������H�X���J4r���s�K����
���<i�Xp\�����DQv�g�{L��DT��&���K�r��.OV��a����zQ�4��e~�������9�Lm��j]G��U`��O8����kV7'���xC�Q��j�wI@���!C������7=���G�]�Rp_T��C���
So�]��1f8q1^o��H���������2w���gz��M����x'W�G�M���w��}R&��@~�s������=D���n3� �.����.����=b��lF��r�e�M����U��^���_��E�t�I�qs^������z�����Q�\.6��6yb����)��&a�"�r��~v��3#Aal��P�������p+����@�$�dD a��y	Y���,��i��$a�
�$d�IJJ"�h�����,�����e�h��63�8��������!1/o��d����Ml�H�/by/�UH>����K��_�pg�~,�[��
��o�4YM��o��8�^������beG��	�Y#�S�p�Q��@(b�t4��)P�_��z %o���Fsr���������j���/���kn���u���Qs��	 ����H��E��E�5�S-��\"J�T�P�ad��6TnX+�;b��\�X��yN�:�7x���4��b>����d���cG��
���&bR*����n��?�*��?�c��M���,�o�;b�[f�L@$��{����,W����X*�?����H~G�����Q��)e�&���������&�Q����Xl�����A�a�������<l1Rcp����R�"�d\d������ViS���z�4�8�=�^�\^�/�%L�'�3�S3�K>CQ�1���4���[��T6����/��5���[P�]b@��.�Cb���:�	�*rX�@L@��

��h������RaD.<BL���=�f���&�n��o���1�O�o��� +%(�F~)t6�"�N��4������Wv�g�d��(X�>�������<�\���z!���4ZR^������X�����np���mfc����tj�.��Ac���/�����LJf�q_�}0s�\$ve��V�
�UM�.���R�b%�������F�U�vw��VJ��	��#��[�\�8�����`F��e�����l�N�����'��/������� �ji=�_�w�/�����-�RA��������I�]�b�4�����-
:���<E%m�;[��&�e�J�����*i�9lI+�����G9K1�������rxv��8
���/�T�f�����7�~��M_�bIs!z��"�:1�IjS�giu+�s'-B��:�E�G<
p��_��&QOd���`��b��]�:(>�	�����$X(�1���S���3GC������Hb��$9}I��v�����#��I��*0+j�+K���8^��������edN_9��%
5��������<I{QE��B.����:w�����O�	���}B5%�mn�I��#U��*�R?L���'���|LV���^)��;�^��-�!���q�m��������z����`������;�Z�G����U6)t�5k���I����P��\�
��o��4�����N���5��B��"�WxL#�}��THT���PM;�NC���m�H�RU�L��U��_�d�7���������ULm�"�V�U?�n�O��"E�_N�����h��;�����|c�?Gx��.�`������d(�����������{�\t��Z����J �/)�{n�g�yd�p�X��[���5��DWlj:���4-�q�b���N�8��P,�92�u��e��Y��	WT��_-�2������M	s��?$->��!����g�ce�>/h(�����t����qJP����c��lye����z9�Zg�_������	�����>�������O��Z������!����������;���#�����?������3~$���������7�����v�~���H���j����{=TN���<jf�d.YH����Tz����Hv]�]"��'L�i�����d�	$(o�r� 3��G5���45u��n�3�xY�i��
3T�Ac�/	�O�7��b����������Q�7���C5�:��0���`a\#�9�5q,�g�
��7����,��[�����	��S(L|��GI~�(���?�%G�7}>]�LBNm�r�8����$��8�_}�7���2�����������QOO�4���sfH��1K\f������&�7�90���^�0b�z
�[�$9h��i�2����I��p�iW@|��\`��:�Sw1������E�0�<�����pm/���Yn��������r:��^c��������m���J����5�a�?KvT�D�����=iD�.��������lH
��h�:'���|�7�19����g���-rkt7�e��V\M�������K�~ ��n�Md�[J�8����	�d�L���Y�p��j�x��(�iIFqz�U�����'q��|�.����8��#)��yP?|X���.zZ�d�$J��=��[?L�����7�?l�M����]oCO��U4��$�����ywOei�>����|�v������	I�$����v�?A�h�[*�l�L]���l�m4��:����"���/�����
�A�{���������:�d�vg�d�5lg�eC�x��{�h8coR�8���l����\%�+=��ao�P�`�$|-&���GhF'�{O'�p~��a���#��)��!�Awx7��������aa)���zx���f�/�CK<�@��,	���X9�.I���e3�
���{��D
����1������T(����l��%Fg�������cJ��k1YY�@A�U�"�q���Y�|���!�����
�j�>r�.d�z���,���@i���E�v�b�H1j+�B�7����)���!a\n��K��U���F�G��U��JM^~�H��|������Ap9k
O�Weh	Dr��(�f���`�q�F�\�7�"-M�&z��cc�F
�$�sN*m�]y ���L�=�W���;I�dr���mC=c��`
���LE.�������5��BsGH�*VU�!:P��������A�p��YN�.�\�5�����3B��h��!�9�����>22�j�#e��2F ����_�0��/M�i#j)
p�*�(M�Z1�G��~
	�2�I�C����L$Y�K��$.5�a��K�����M������l�Y�H'��.'b&L\��"/��K�eS�k$�����NCc�_�s������:�h�<Qd@�%�OS�h���I�X
I���'�n��r�������=fC��8�����cg� ���E����s����"�<r�\�)o{��� P��E�e���}���_Ng�1Xa�mf|z�k�\(Mw>��)�n_w�������F���v��6����-��B���9�A�
�[�p������(R�R��Ah%�f��g�����Y���K����Qq �
]�����#��zt.[]��A�*J�$���>���Q�PF��
��O��=!
��"�L�����;����6v!y�eI���G�;<�R��MdTc�
B��aF���1�}O]I������5J���
F9�sku�b��I��[m$�N��m����d�,k��'y��XQ0Y\�G8
�J�������Nk��j��[��ln1�R�@�ZG�����h�C8/�e0�rMsB0�����&�DFa��+�n�A��C���^�����Q"�_�
��b��m�/�y
��?���4<;����w��!;����9�������%���?<�F������	H���������I�>�+a�;t����ZGO�L����3�fz���F�O�(M��
�������G���>�����	����%���)���O84C`�����tg�����.��t�� �y���R�/����-S����l5�GH9]x�~||�������'���V�]����"]/	q�h�]<Z�u_�(F����Z�M�p�s�R�����2'#+�{��4J�����(��2��~��{x7��9�,��C������t�58��J�������/:�"]e�7{X�
��n��/��O(�������T^F�����HW�sqGP����U9�N�?����%F���(G�j�WfLZa�A�|V%�#����K��e���;��*o�����(:�R�o���^1����j��`1����z��V��D�w�D/K��KkT`�U�I�&i������-�)�t��`[�|&���f�''��|����Cv����P�R���%[S#!�N��({eC���w�rr�?]*��J��to�}�t��&]���=
�eE6��?�m���R�&����'���Ux�M��YW�����d��n���%
-D:z����/?���LNK.�!���_��#��>�p(\`���:>��L�#wwL��_��&g�N����z">�I�����*g�%cv�=$k����S/���bR���j�B���7���`��8�O��%��&v����J�8�j&,<�6Y�8|~�a��*p�,��&�4rd!}������p�55����i�;�JR��_�q�SEr^V2Dj���w������Y��E6��z�\�D�]���0�3�� 9���b{<���k�q�1�u3��	��
�����A���qE�Q���e����J��.(-Lm>-��f�f	%�n[V�-�Js�m�y�����66o�>������I*�C�
��������1L���_Xx������$�QF�I������g
d��K�3��1C���c}n#KO��,3���s���\�|�MG��|����v�Wk��0j�����<~��$�������eY��s���Tg@TrH���kZ��Nl�b��O'����o����1�qC���H�w��*�{4��O���,u:��c�MuU�T8�����2� Wn�Nm����IG)#^
PO.�����W����3z��k�������W
��R ������u!������_�Q�B���}�Wa��Q� �������5#�]��k?�l�	��:�5����J��S��Z��[{�g���W�$���#�Ju�����GZPFsM������6�.����:��S���n!��q`GS}?�B����b�t)��dW���a#@�+_a��>�0��������a���9��x�a0U{���m�L�X��rP������o�
57�{�7�D���I���d����������R�a���V���;n7Y���/�X�$�'���3P���D/#���^�.��j6��FdUe�\<Vd)�?*.�-^�=��8)^����;rk1�H�h�A�-8W���{=R��i�[�����?��$���MeD	ky��h��RR��WLJ�y��*|��LJ4���������Aw�����JvR��J6����3U��8e�T�6~)�}<���*�ou���	Sx\E�������h\E��r����?�Q��\���0n�����G�z�a�;n'|�:f�s$�X��K���0 L]����#:)y�|k[&�Q ���gt��y��s�����F>k�A���.�`|_P,K2H)�NJ�"������S��1�C�c�L���������B��9}��._��*��Z���s���q�9���������!��S=�Id{;9�C�� ��YXJ�^��!6X����>��5����0�(��� �>��I>����f����<'-0�N�?�g����Sm+��
�G�X��R���:|f#���7z�������q��'�_m>�iY�E��!]����.)�%�l?�����N��C�!�n���G��	��������s��c���}�?sJ�%�z����1�)������ ������
.�G�����f����x�<#[[��ij�YcW�xf����%n�Xb�~�i��H^�;Ub_2�/�<�D��-0oBy@��>��^��bq��be��@	�"�5M�iS19��9���RLWd������lG��������w�OX��v'|�~��)7O:��
?�2"jj5{�/�>�������!CpY0k,����3
������xf@'@�.g��5;ro(#bx��f!!N�����"&8�~�7i6���;kd�l�rB��1��}���'5�d����ip�$�5����0�7\?3Q)4��u�`F�+��^���_r#�RB"h
�]�K��DB?:�+(����q8��l'����&q��1����)�YbVQE�~�9}��{N�P�>����.T���e���c����8��q�����#�H�@��b�P$d���BR�s����AfQL�����{z6,��{���'1)����������1A�R�=�$0��HGSXh�
����@f�[%
����D���a���@ckKfh��D=���:�w`H������r-����N:C8_��F�k����eIR���j�
�&��Fob��0��!�g��A����O���b�����z� \"�1���=^�����
����j}��%o��Q���I���5
�������p�^�>4����xX�7~�!�D�2���8@����-�)&��9����W]*gc��H��S6f�����`�����c�������P��r�h]X*=?9�Mt�<�V;�+��dtXr�6E6����DY;0
��#X��a�&/G�^��v<L��j8��Hr`yl����_����1�F�����B��w���']K|�4��Z������x(�F���-�o~������!��O�O��L��)G�V���E�D���z��YC��������Y������$��|k�����"������xRF�C���f\�S�/�I��4����x�b��!�{H����M�c��yB#�3���d��=����aA�A��1n�����/;�Z���t��?|H(H�u��n�@��2�G������>�����}��O{����V������d*�x������7������{���w������2�t������:��	�h��JK�	�>x�<��������xq�hV��|!��.���,O��o������!�{�/6�3�������F
����9���`_AY��{C�V&G���*����Z����N}�9Nk�#�]���]���a��� ��<��������&����,v� ��5����O��.�4�����&���a*�-����Y����)��~�e�'�	U)l�����[��o�Lb-{?���94�y��|�;3X��� ���*�!�!
�J���Qu��~��(��n��7��J�^����\w�K��4h7���asg�g+�]WYd�y��<F����4�������������EP�>SZ}>�t���z������
>��0�H������Z;��YZ���oI��M�p�sR��jW��'sZl~����<Ul�Qq�|	fE��V�#�w��t�"DdG�%j9���Zsh2���IB1��nD��i>G�����!#��Z���"����I(1�:���D��&�v��������`H"�>����)����J���*���TU�"Q&\��M���+q�CC�q��q��{�r�-z���.����B&�]��2����'?����Z��*�Hf:�$e]�����L������}�'��T��9N��L�6CY�$A�=��G� ����m�AX��@�Wv&��z}�M�q��r���������8�
Q���+w#9��"�v�v��73�Qx�Sov�7ut��|#�}�>����X�8��q�^�DwY�������jvI���V�r��'/�������4$\;��� �1^��5J��,��]�����Ux��3�	��U��	Y����d�oFN����M>�����W�����sV���-�������v��$� ��Ltb&}���?X�����H@	��1�*Vs#]�K|
;��T�dC��������#�����&]3����N�0�`�I0�������C���X^Q�l-"��6I��qac��+��v�"f�S���lW�g��SC}�2M��x�����?�k�bV���k�'���c�/�m��1��()K��=�~���yt���[�#&��]�d�8�S������
9����������h7��i|���P	��i������g�~m�T�F+�����$n���V�!�]0�;'�r�B��lv�p��:u�����T�C�:n����
U�=j
��Pz��ME�K�	��y6����|���jf����i��'����c�?��OqK}��G
L{��H�����@}O�G=������d�7�,qb�������_
b=5��xW��h������������*�'������hBs�d�����&��y��,2'���HV9�t�ji@��UK0�#�����G,
� �������
R�,K"�����p�����I�/��O/�+���j{��H�H@j�Q9�Je\kV�
g����j�JL��X�����&TYD�*.�8_I��G��9�g�����n���w6x{u-Y�7�����pZ�V���������>6�&������`��8��!A	�v�T#���DS���]����Bl5�0
J��S�!�G"�P�����������Q��
1]�$�9U�?��R7�O��%��A��n��t��i0X��0��"��q��=�*��a���"$�lA����7�kD]����Ig!:zG^���f��j0���[�P���0T��fJ�� ]=��v�����1����SX��T����Kj]�!D���X��HE�$h8B�i}h�*���-y�����%2h�8C70�K���Q/eb9���H��yp����Y�����;��K�?��Sd��s��/�����W0^8����dX���2 ��y�%��(��A'Y��.���8�B>�l�����p_�|�������d,C�:��k��v���h��y��bF:�S'��\0Pb\����d8����
�������8G���PHv2w�l:N�R�[�jkCfN5�],��q��m���(<� �	>p�#&.U�/�m
���(6�d^�����JD�{�7"�_}'b�B�c5�.���'��V��z�x�*������_w�����GY^:Nq��vWq��a������;����rnc�oE2�
)o�t/l�e��a
�Y�8��#T���D�!s�5[���t!�
(h�+�����^w�������<�`�����������u���&�1�qw<FVw�iw7�"r���{�����Gm���B"�q�������5���%�[�R������	�8��0��I��]T�9�r��6�1����#��g��iq|�����r�I��ZOY�m,<<��d]<��S>D����V�����1�H�+B��J����\p����������@��>�W�'��4�������D�zZ�6g}Rm����brsK�K�c ���0�d�5��=�XKh)b�5q(A��Y�:� �*�fB
���(�0��g������Q�m#��+��U��.Dw�/i|�2q�L������gy�/�oG#a���W}�/�F�`������1t]���A71�+}����0���_q�<���\��0!X���V��zN�!x��f2A&*���j����F5gr\5�'E�c�J���PV�"����:��W�������-����7������D���-��I��H��E���PQ��������_��M��]��*S6��f�u���Xm��w���i��q3�H��c�����a��q�XT�k���C*�m�&���4QW|�I���K� oE���|�3�q� �xHD^���~@w
�O\7%j��	�����u��F}�
���YK�$X����t��U���
��F2����l>l3Hvv�W���
��5
=P����w����=4�g������	M*!���C������D���q�!�����DO���������(�]"��������.#�����Y�@;!����	Zi^���7x�x��5a��b�|���}BsG5"������Fc�E���D�z��R{�����N��c#�_Km�^�%4�Y2�������K�O[
%��~��ndQ��3P����0y#X��8�Fh���`}Ej�	����i�����C���j��t����H�c��;�POiZ*b\��~�����G����o�1��� �Ra*�P��d��B����$���
)Y�.�&*��r��ku.�`��'����O2��{1�I%��!������P2��im2+Qs�qoCq��N���!a��^J%��b�@|Hn���C��N��� a�����w^�A0
l�J_�c��N��	�As�/��T�'`����d=���[��N�&�����m��,��#�����P�+��PcNI_�Pc��������e�%��=I���=^��C$��V������H������
�Wj���/�=L�w�PZ����1�h'q#�V<AN��b\I�w6�]���7���I�Ri�&N{�AM�������+���K�
�6x�����rt}
���\��p������~�W/�B��E��K4yO���3Z�W?����������������P������T�i��������rg5ED>����������J��j��j7FS��3UDb������N(��Y�E���H(�C�1�m�b����7lpl�>��kFg�Z�����e��yS��R����w����M��g��6�{������rZ��n�~zl�
�oA�1�`(�6L���e��%����gr��l�����^#^���4*�as�u������h���$cSq\*�����["�����:9��]���7D�����N���������_��zQ��
�0�i�]�q�QX�";V��3�������)7�sTPW#��D��$U�lX?Voz��/U��J��{?��I�K�o{g�'X�1������<�{7��|{}ua�($��R~B�����j����*8x$/��������684����ws�n�?��)��>w��0�G�R�|+�������7'����>V@Xe��{ b�J�l��s�!�<���v�
P�)V�����^�w����O��D3��-����}������#�j=Vj�Ifz�p�n����O��'$�{Ho���+	�Io�X	����3m��O�<�0����Z��my��Z�4�����t���*H#�S�H����~c���*1q�1c�Z�I��$r�;;9�d&v������ |������(�����[Knq�����7lU+���������\~���[�#,���ue_cENA�})QP�/��c������3���C8���2S��������p ��f���s���"�������a���=��6%��@7�J%FBju���=��|E�B�����w'�;�Mw�]�m�����W���>�Tj�nm�������-�����q�,�
�������^�{s�1�S���Dg��?h��Z��sM?p�Q��5��h(���|�Z��s��s�,��KD������uJ���.}��}�=���u�P1���NL9_Wb����XT������k����i�*���U�a
F��8k���<Q	�
O��5]��������
���|� ���]xloj�	�	b��?����)H��/�KY�b���a��#�<;mnhn�BP/|�����.����i�~��f4��8��#��K?p�G��]��	�Q�/��
-%�<�����/%/�����C����x%�i�����k9��&�2�Gi���98]���O6(�,>�k<(e=(�Z�`���c��2�����O�J(k���B�`�DD�<��|Y���0:5"=��Vyg��G|��t�4f�������3�d���m��e�?���=����MBf`
�x���V�W���<\�k���+�����c� e�����G���J��U�iU���i7c�������r�v������W�Z�Q����]���5Z�J}��o.������������u����b���������wW��=<�@���b���b�wi��������`% �����gA�W����iz��wg�c�/�t����$�Ca��)�����n#(�Y��+*��������?.�"����������|���zAE����'�<xj��!��L�uHU^���	�70-��;�P��
6��NM��T��[1��-r���j��9���3$#D` ����>N��xT�{���|�X����k�����y��W�ex������*���
9_�M�V����z�u+�+������a��??���CM�%��QB�N}����J�����Q#?H7��t�nV)>�3��v�d��%Nf����n��:���e�Kj<�h�0u�D(x�3���b�o�9~Cx�x��
����X�{�y'A����O��2JQ,�{L0��$�X��t���W"WZ%(���P\�o����g�r�%���9Q���/�4�%v��.x�\�.���������5�4�%SwL�Tb��."�G���#8x�.����"��\��Z��SW*�A���
W�$�a)�.=eP�`�-pH(Vb���4�`'��X������������2����v�1��U���
�����~d�p<���;��}���x��U�E�\LE�]���R9mUs���q��p�Pe��}.�Uk�z\����������m
r�o31�	��:2?�@VXG��������\�r�C	����q0�N��pT��l��Wo�������
�A�k�8�:��?��l��l��B�6�k���z����f����Z�j���<�9LK�h116�j�AZ�d��P��#��z^����Q����*���^/�(4+-G�&r(�w�.����O7;�����g�9N�,a�~��W���#?�?��F���?�]u��o����K~d�n9�aW�%���?���_��{�0�������S?�5��y����N3e��~"�L
��ug�In�K�29	)=���c��1�F'�k��7+4��8@�S:��s��K��#�k�(��"NE��EV#��cZ�D�TK��B4�;�oV��]%�����XG~�	�����CG�
��6$ga��JxV�#q�I�� �B���!�o��Di���'��=�^��AU���]Y�c�IA�S��F]�������������-){�����P��*l���z��Q�]u�;�!�e�).u`
t9v+�UgoIy+������2�`X���=���@��CzO������rc1�;��T�1�Q���5���S�l ���1�wD��)�%c�G�9��%����WSw1-���s��]��d�(A�*�~�d�}r��s�~�n�������h�����^<����]�v��U��}ZL�2\��~W�������]7�������s�����]��b2��V�1�����C����;��:UJ'��|��P4��hT=J�hl=�$�(��17������B;3
���Y�`E8���lN���L	��U"����s���������?U���Ny�'�-�	�������W�g����Aw	�I�HZ r�9J>�IN��tC�F�<��������k����-��#w��o|Zu�Q��2����m�l".�N2x*��l��T*;�*��C
�&��*������b8B��
$��R4���Q��o���=���F�a��t����68J�>(oralX�����r��V*����G�\�����~�
�9\�^�c�p]�9nc���0wC�Fm���]�%l��x����8,@+���9X��?]��40����H|��\�?���+�`
�W��q
[=�!��#P��G��?XG?�1���L���1�������g��N2��(!���}�.�j�i�������=��dvR�<�����=LW��%X��P��8Y�}�L��X&`6��oa������`���q���`���4���G�r!@I��%���#~��{L�!��#���C	��v��K��/������|��%���4Zw����Gw���`��#E���xr������AN�-d�V6����B����[m�*��������3+��M1\{�N�Z����?M\$6�u;@��F�1=+��k���uVg|�T���D��EI%��q	�m���tU��Cu5���9���EZ��C��+yJ�g�,�>H������"]0�]��G����REt,D�Q8|��@_��q��B�<�3����&��K�A-3\y�C���?�x$(
�-�+J�U=W��.���K	���~��`�#&�X�?x:������� ���@�����eK�����a�4����A�lG-:;�7�4�	Sj\%8DN0F��9�e�)T�'��o,��=~G���b5�:|3��dR:Q�|����4���&&�"��pt�a�?���'F*�e��:�"p��b��VO�����H���{�#`,����i�!�N$B��z�r��V�"]	nVswI������l���)V���J���,0t�b�L*��F�h��hZ�kV�t�/�Z��0��n����E�sJ:!!Ex+��`x6�C���u����?��0�&Smgg�e���[�hky���Qa���3	��Z4!�?�[�{s?x�O#�a�*���9
L����^��M4��&���a�)�a5���X!Fi�~��w�����l���]�q�/�%A��/,�'t�C����3�S#Wvv}���4��9�CX�3q��b����F�eT�����'aG���� 4D��x�b���V�\�����5H�5�x��7K�����y��G�Y�.�9.+���kc����2Au0'[!����d��A�2X���m0!�F��^��C��L{�-�hC�%)2��m�9���O.z�J����b�����wd��� �F����+q���Y��������!��Gnp�I3�PM"�(Kx�{��A�����
a���ta�$��F���i���#Xap&*�X��S��57j��o����5�v�1 A$���F
����FC��J�p�8_���D�y���5#P��D�?K��#l.L�2��A���:��b����DR�h3�?��Jx�!�-�*��e��iIg3o]�0��C��"_�<��m����,Zy����Id
b<����w��j^:%�}�j���{���}��L3<�i�l	�y)O��K^h���~���W*5=t!���8�8�d�z�'�y:�{��l�������7/	�'<����(b���8:�MH����f,+r�j�h3%��
�;^>����+B���������coAs�"6,^8�#�i�����O�gQ%�E|;A�)��B������>�'"J- k��"���7
�"-�-i�r/0���J_	�� .�]�~�7��D����]<��f�����+b2A�"�p�#���f0P��(u��8�0���@��P���������%�=��Ta���^#��4�P���#��89�^��(�'z�a�P�c#!��J#
�+�����h���o{�t�=������z�<�Kb<#k
�
O���{�`���}@��$�4��v^�,t:jO�T}��O��������:Vv,��CJ�I�8�B�W��OH����l�?���pT��7<*XM���
p�x]����,%�����){~��c���Dd"yb0�E�����/d�1`|���q"���EF��� xx�o~a�E���^��Wo��wz;����/��{��?����o{x}��-��<�3���g�
��VI�7����}&0���J��g#�[=����5'0DF����n��C)����"
����w�y �V���Ut�&�V��_J1��1��(DU cM���\*��9��qs������4��f_4���D{�	r��'���.P�����9�`X��Q@o%$-�[LqA���d��E����r.���;���q*�9�oVVx�\|l�7�=%
1kR}BM@H�k0�up����G��=��w~���
�@++2�pR;��Q�o����[m�?��:[�&S-�/1_\��
�E�d2Z��
[���F%��X���0,
9C�?��D�c0
pDz�t�ztj��m������a��d\�1���y�Mi4�
{��K�e!rFT������9�M��,�������qF�������U�9h�F5�3�J<�g`x������@��P=��L=1Q�S�/7������@���]�?5�8�.J��_<w<�*{d#�S������1�p�!���(>�7�����h>*k�:o���C��JR�KU�����'�'G�e��5n$|���Zi�fn��K��pn�p�>)��~)c����B��=��#��q�n;jZ$��
7�oZ�4�T��k3vN�O�p�(a����2q���*���Ln�������c\����!Ldw!���`zz���p�^���]�`���}�'���"����0�@����z9\-oo��+'<T'N2i7y������G����
���7y���J>�pu�M�v����r�vp�����	i20��Y'���v�����]�V����d(��\�T���?�&�@��������?c�7�Y�ZT;����Goq��fN`��J�5������P<������x��������F�_�o7`���.R+l���c�@48/�<KG��3��R�L�+P8�k�)��S$^�����
��x.q�`��(���v�^�
���}�����3&��A�:������/��u��j8������*�4��R�6RB�
�~�T`
A���I���0:GD�C���>V2-E���Me�%x������=rh��w}�-��/����4�z(����- o��p!?��cc�������
dO=���B��
�����wp�O#]m(�A��?��[DQzg�4�JL}���vf���7z�
@���}"�<����-��$�!<�%�y��
N�0�mw�@<$�ryN/,���WTy�aW���ZR����}�������Y�m�����D���<�=*�����=Lb��dA�?�v�ql��S*�
�h�����lJ�5�9�]�;?9����w`��x���Kq@���gZ*���������<��NY�����w����8������R_�V��q<�p�G����OC�r�6a�=������v����nJ�FM$2T(@�/A��+1��P_��KSvo���A�c�����3'6A����E�� Q
.�����������Mg������s*��@�'
�|6��e|��7������;�����W�S�F��1�)������F�L�c�N��S��
��(m�dY8;�$�X��}4!���8��A��j�b���=���#�	�}�������>����ov��W���|�z��b�"��[�m�{u8����J�1��n���JrC���(��a �A�}��?��}G���������"��ts�0A\����2�8�=n�Z��juJN�K�eZR`���������������X�����@�0s�������#q.�_W$��h�lP��������%K��q����]#�n��E}��|2�T/b��e���c�	�E����!���8��xT9�Q��Yw�����f�g(�������&��<��"
p� C��s�^�#�{�W�L/�,j.�n��$��%3�`�H���*�P���BF��~2
����w�������6�<P�� ���=�)A��g&��y*�dVzrt-����	��Ft�S]$=.T��3AG�>H���)z��@S����5���2�MW�,�����|�V)R�O���I��eNZ~���"�(��*���M����s�|������������%m{����������3N/�K�_��V����Bu�'���K��.m�/���P8���,�leH}�J��(
��hc�qI��`��#I
b������$��� ���������[��1�h��Q�k�gt�����Y�(������g\+M�D8T_������W�x���L.���^������u]xX
�#6>0Wu�]�D�+"��q�n�9����`�tx�D,�?Y��Q���>��]$�'���������?������'�7't���\��ls2^�~s��;3k���sN����zN�]�����
y����;��kO���AU����s�k<r&+P����L}�|c�~����Q���\�i�s�
E����<����Yj�����J.��
�7�pGk�.5nf�����-#��r�����Ic���d�������Bt|?�;_9FaHP�MD�HC�.�)H�H$����|=Wp4��
f�$z���:4qh:��D��H!(Q5�Gr�����l���at1>v�p������'\�����`Gl�C����^�ZH��y7�H�:K�_�>ti����
���kO�E��R�D�TR�t��>��[wt�����2vP��X��W�P��+P���<VB~�f/L�V�~�x%���u���Y�i��o���Zje�����]!c�����P�-��xQ�*�[�h*~/}������O��%���G�D���l>������,�AI�m��bq�oeP�����ABm'��wC�
�5�������`lv���:����P����"�e�v����6=�X*K��r�.�U��
<���u b�|��,�d�V-����l���r����1Lr���~��2�@�C�;�0��|�dF�A�-Z�A[FlqK`�#�c��z�4����I��M�j�2w���J��mu�V'�a�|>�+K� 7Y��6�79��[-88�a�XV:$�
C�N���F_�G8v�'2R�/�^Gg#:k������ Fy����"��N4���X�,wJB�����R��N�Q���k�Z���v���t����43=:���h\\�1�>��+�!XF���/lgug�f.�G2�g�uh�t�@�=��@���BY�#�R>R����)N#���D����r�f�
����E�W`�C!�G����{���Q����O�����I�pQ�a���[p��[�!��o��}����/��������R����d��e@\����[�@�!���G�W,�����Y�����!<�ch��+���AE/�����1D��,�v���"�geS�j�2Ve	��O�B)Sx0J?�&"�1���
�
�'�Z4Q�BaoG14��S�_�&4t
���+<�o3��������:�e��6O�Q����M����/Pi}}l�+8����9�<���v;����*��&�JD����i�c�Q�_?�x;"k'���V��
4Z��Y��:���_V��g]
EC���������w���$P�����~L�*W���e���I���1f�u��N�C�Xn��8�dB����W'/7���������Fn��-���}��'���6n�m��T����h��o�V���U(E�$���5�R�?�v)�BK�h�.�4�L�w�,��k8]��#��]G%}{U'�T��[�>��#0�z1
��)c_1�p=��`��K9�r2�@�$+�/An2?�Hm���Qo�J5X��F�T'�Wv�h&$��eI�l��
�}��H�Gf��N�g�l�0�������������>���Q�i�vE�
w�B!f��w���������5�o�/$mu������C<�"����M�!����a�;���"3�)����H^���#���R����d�6=�*
���_'u94�
�Kb������������|8f�#k��d�Oc�e-�����]6�$��Iv3J2��H6�%V��
��$�����Lx0,���H�v�5��e����V:4�����31����h���h�
��J��N:Y��k���3���������u<,��%����K{N������A> ;�?���rt����{���
7���G��b���A��`T*|������������9�Fxc4!������j
u��[�7���4��["��b4���x8 iFq�'���,��@��`:d��X8�:��J���^D�w���h1����"�2�a2�����L
�`����C}L����V�5�������n�'=s>>���$��x���8��F��������XB����MwA��|5�g��g���&U�e�| b�>����*��O��Ah���N��� ��@�<({4\9���8����V\W�P�U@0ng�l����%L]A:���aw&��J�*_�sX?��)
A���F���n}�o�Z��*r/v����������#�f��P�w�
��p�)�2�(RN�B`F�N�j����X�G9��h��������h�������,�2v|������k!����Z8�o-(]��-����;x��������E���X������������O	1S���J��u{��zk`���pk���mB6��B��Vw�B0KYC^��(RN���>�;�w������~FXR��<��s��so]���5�
�zc�qm�&�!tDW<C���,���c���5���h�:n�2j�k�d�i��
��)���a��p����!�N9�$*m�;W�����w�q,�_U,�F��J\�(x�](��G4����	����i�d}0�?�"���`�#�([�y�G���b3���������5H)�6P
r�b
=+UP�m���-��)��#�,}
WX���>�s:
�{V�U���st�Ai����6��d�=���O\~��\��,��U��At�N�)<�Y������hV���#���-J��l_E������P
@�T`���S���T2;7�OaPV�*�p�]�U�;�n���.=���-��-��������CjwT��~����+j�D���F�3x��v$w�X�������k)4��,��x��F�,�D�0��J��'	����T�*��������V�Z��QKU����x���5�Si��q������3�X���5����������F_Efi�/t%?��E��&]/9�Z-��djL���.�I���h�����n�I<�){J��gl*i3Znc��il�n�e|KpB7|����Sk���J�Vu��������Qj"�aC����&X�DVY2(B'�K��:�6��(�1�7����k$7!;������{%���d>�o&�An2�A~>������=�ZvHl9H���8����XUF�"s������z�'��q����?J���o�}����J�;�����+�<���L	��m20��S�������7�k�����Z�NL�����[��~����^��1�YrV�	�E��Q��cn&�-v3!O��9�� gck�7a[:zB=1S;�%�������$}"'��^`�������\J����.�@�'>����5o�5+��4��Qzm>��h�>�ar�����)�+o����`S��0|� �~��w���p�)�^8��i���+%� N���1��2�z������������/����R�|+����N��_"���_��Y.�?�gF���h�Mc3��}�������F
�����#��[��D�bo��8�K�/b���%���@WY�I���F��8�<S�b�[T����VB��T,r�������b�����h����;qX.8�9.�
#���B�D���Z�O[��z���v+��Wku;��Z�O_�f%���iR��o����X�����u���@�N�o{���I/�g��N���������_�t�02��3j�\���!o�^��Z�����8"�Y>y�_d����1�	��:c�_{P�V�#����U�]>-�;�LjHC�l��C�7��|/`�w���T8Y|X��I5�I����9ho�@XV���H���Z$^4����C+�u��dc1�6uR�Ga�
�zF%2u���:?��ku���wN�������v|��[�#���zQ��7��q����i��F(�.��z�Q��*9���Cji����j2s��������77�p�{������^q����Q\��y��
c������uwX�����U���&k��i-v4�
#�(_�B���>�Ot{���q��B��j���O���5�t4$��h2����k� �9��q��*:���t��h�������X�B$h����:�g���kt_�M���s����HW����8&z��2���@x���ex`2��|�S��yruB�d���K�g������aW�#��*�x��z��k���]R.!-�.�i�~��8�qp�#x:�7xT��j��Gtk5FBy}������7.LxZ���[q��B�[���7�\�?-[8bfS+C����#������]�k0.����q��pM�??�g��*4-{��8Z�*�v���o��^��M�]u�����v�+������F�O������}�o���7~���-�^����[�Z��~�����AS?�>�L��<���xU�r8+����q�&A��._��a�;
v����zu��d�^�������������8�>��U��V���z�n�����/N���8�6T�Zk�r�]����v���f�Zo7b�buoa�Dg����a���e$��v_9d�~NB����p�X�M�c��P��9^[a�����/
@�3���*�_��
�����������!&q���5E�W���&)�V���B�,�^>�F#���O2��Cf��+���f�L�����#&�;���W������:{���J]�j�/���zA�C�����_0��,���kD�=|$�ZBYPe?����E��w��)O	����B���4�+��h�� �Y��)D��[bc����E
��H\�L�y���q���#v�;�Pfc	���NM�R*\/�DmA	�4��#Ab����)lF����\��&D������_�
��e��M]��������4L0�BwLc�������:�.�H�i����'Nf�'����������3�k�t8V���2<~��3�+������by������?�}h���Q���*w�>-�.���������z����w�}wu��9L\�����K����������R���2:�2�be��C��.���!�%"����9�2���>�Yg��z���.
�Gie��S)��0����j��:J���%�U�So���kI"���@��u�������a@��Og�f>�P�H�����;'�gC�UWOe^x��g��l�R����D�����?�O���V����j�jG�h�� �g�b���BQs��BC�#&��pG�n�dg��r�p+z���D)�((��V��xtE�����!�aCU���Y�9��������5��E�^���x��4wI������GC�7{\�P�z#3��&����L_qH���.!10�!�Y���at��w�
u�d�v!<�hO�!<X�t�1Y�d�>X�|�Gk�)]�a�u����JL�'��#�0�&�
����������91Z?�0��Q���d�N	��(��	s��`Xy��j�@��-�.Jz1zT.1���"�c�G�����*�n�3���3�g�0�=�1�i������V��6���e�Y�C�>��������5�o>,nH8����W�."�����Z�]�8�F��n��T�y�b��_m�J5���?M\$6D�6@����_V����VRK�Q�����0��,�6��f;���PU_{�*����a�S=?��o�h'�R�!�^����jp��P<3<�0����
���j�����I�,��%�Hc�W��W�� |h9�e~�`n���*�8��`�N���,/��@R�p�bqP���gZ�"d�H��x��9�~0N%��[@�F�Y"�ps7��B�D�.A`�]%_4P;��'���i��KB��:��(�!� ]��D��W��`�y��B�B����d�������E��G�/]nG��|�������)$���q�G���$��:~#���bf�f�L�+��q����F���]�O���C�#�(�T���y)J�����.���X������;f�ix�������|o!�&H�h�(���i&�(�����N��f	�BMzI���j�����]_�c��>�2��3qZ������E����0���E>X��s���,��_�6�
'�x6�
# Y'~�Y&U$Y�v��)~N9 �tC�Q�4�vi�Mp��jb��d�U*�5X:�����M��~��BP�P���Up���:I�"7�y�U�L��t������.s�.�F��C��;���U`0L���@C����$�5���[��)L�x���;H��B�KgD�8�T
�����MU�Zx_��u�t��}VS�376K������z��L�����-A���)��K�R��$�����l"�D��a�&�����^L���Ot����x�Y�I�EJS���2r�qzh���q-�K�$�Z�S�@������!��%0YlJ��&!p����p2��C,]S&�1�4�8@Irz�p�x7��q�S������A"��\�#����`�dENY�Pf
o�� (~��r�rE�,}3D�"s�PoA_q�"6,^8�#4�$S��E�����nG�4�!�7V�.�9��Pr�6-)�>>x�`��l�dd2=��?�6��f����b%�w	�G�BEU
�����>�'��+2	%U�z�zux���	������7$8v)���W���{
�<s&�u>����x���9�S&f�q�1�$G��������N��O���o�?h��4!l"K��6����	a�
����%9��t�e�\�v`��G4�cU=�]b������J���M'����'�o�uI�8F��W�i��+�/`)�A��c��$�+��������D:TSc5}�f;���:W9�Y
��G�?.�/���b4�L�a1�����LEf����N�[���\z�;������No�W�2xw�����}�������i&��Lx��Zo�[��+�8�:^��w��B0��2�Il�bl#My����	Z��
�>M��[ �P�����"��������H1UtTA7��QB���7>������D������b�&V�}I�P�h(wz���5�����9���a1
����e^~�).<3����I�.Oe��T��#���iRr!���V+F�8�����>���`"�zJF�r�up���������@9k
A]�����'a �0o��+jG�f*�D���������O��cB���t���&������=�<b����j�4�i[��!��*$E�k��u?����`��'����i8�{��/���7���N�P�>,����[9�|�{��)�7	D%������U��A�X-����D{�29���g0���o�7�i���T���@nr��n%z5��A$�1���)�>-P�e���zb����_n�������!.��j�	j��6~����H��-�q��@/w~e\1\}�`P�@�I���Fm�E�Q���	y�F1Ff�xNG��A�!� �������7>^�~g���yKt�����Q�5%h���)z@9�T�R���
�(B���)9��E:�g=��D����t��h3v��&���K��X�l���w�,�S�P����	�?s��y���b���1Jx�$�l��:�wOM�k2��w����g��F���0=��+r�}�|x
y�.�r����3�q���U����-������0.e9hE�:��� Zh8	vBtl�'��9������J�V����YL����,���������P�5��[���
�O���u;C�7���#����9b������]�V����������u%\��S[�\R���Q��?����� ���,���-�V����P]^)0�]]�r�"{-��]a���n��j�����'���������G���[Q���X�Y:���f&���u
gvM?�#�{��{�F�~����Tc�6E\���\\"�l��}����s	�������tLY�7�_t��x��p&_�gB�P[��fc�
B�� yTnp������>V2-E<|����K��#��U�G�&��O�xG"��{O�����>�rsz	���+?6���-~_��l�g)T��Ap:��1|m"Z�\���xrn��j���l`+1�	�:#2�O����*�Mf�~9d`�
��b�������j2	�CdZB��z����������Y���szaI��>�������B���CUMwVR����(B��<;�>K�
��3�h|���'���p�{x
��0�pdx!������BO��+�%?�2vm�T�u����T����5v���o.�3C��i�@����N�o��,S;�Kg��7���W��=B��Hr���0z�
cxCo������S�������� �����\���F]���]W*�`�������=rW1�`����Y�������3�`���C+��`s��m��-�W���&Q����/f���?4�����),an�9JQ!�4V2���6�6��tb�����bm�ag�#����&����62�6"���v�{�����"�7b#_g�s���l���csF��\���JYe:1YLG	7��+��>-1�aHa!�5������*F��OS�L�K"<�y�RI�/Q�q�Z<�ph���./���f8`������5�l< hk�T<V73|)-��8�Q��n7��������P�x1��#�Z#E
%O �-X6�DT4�y,U|;OpC��a�2�Mc�����fS����K%����x>]h��v�����F���	|`/<:�y�d6gZ����>�
�1�
��������HkM�����c��1����@Y�m�g�Ec����c
�k�	������o�C�����-j
+�_����,��~��G
�	�_�����6y��+�N����N��2�z5�����������nC9��6�6A��7K��w��6o�Q��"�W��g��3�����C�Jz+��^�XZ������j#��?���
]�����V�>H���)�����$��!�����?t���t�����5}����"���
0�]����_�����d��4��6m&:hV���tI���V����<�����{A[n��L�Z
6�wp������
�6��G�������"�$5�i6��t���,UB��`LJ�V�����C���s>�y3+���B��`�p���s�JS�
����h�
}u��#��`�U$��~(��t�v?Yu�7N&&�����1�bE��=���d���p��:�VbA� 2Y�JB����a�N�oY�;�^j2������:��\[��S�9��p}7~s>��Hk�N(d�D����zN^s�����
y����;��kO"��U�K�o���u<&)+��8$M}�|c�-��2���E�:�U��M�P�})G����p)6�\��n/>[x�:��������B"/����{p�y{u
�pb����M!�?�m��1�*�&tW��@W��W�����i:_��-)�O\�8�]b�?�{���h���-�X�����it�R��c���9����N,����f�#������pR@`Z��x��A�w7��=�<�|���}j��Qd������k�GA ��w�l�����0����F�+LU|{."?Y���a���D�J27��`��	��(��(1���������[X>���w�+`a����|/*S%v+��������~������G��Hb������8�zj<~e��J_���~+��p8l6^�zh;���zV����!��u��9��b�$l�������PD��u,��?Sb�E��*�I��K	[�:1����_�O5u-�v�jQ6^�|��E��&�s��b�/o�YA7B����j�l?,�2l�V6�V���=��|���v��&i����������[
�;q�UwT�T�Fw�t�s����	�u	r���D�A��iD�j�I�9�LV�)��y������K�,'i���@Xg,e�3$����k�z���v���t�OeH��D����c�a.�8|��g�uh���&���b�k
H��D`;����aS&E���0S5�_Y��lq���(�Lu�J����a�=�W��)�?��W:���#�<����[��yuk5dX�x[/gSp��/��AL��A��_l��R�o��X��D�$����x�(���
�X�%<�c�A�x$��,����b�UL~���-��}��,�"���1��)���_4;��|L�d�R,I�;R��0���H0��Fe��F;H��)��l6T=$Z���%h�	G�c��������En���G�'�-FL�(�:�Iic���-��~7{������k)���6����[�u&{����ds��f��T��j��:�7w�����*$��-����.�yt�����]�����1Y�]&�*�0�.���.�������d�G�z�^n;Q�
�_]\a�&�YzE�[�����,|5\Og0a�Z�����$1�|$'�s�y����2�B��� vv�xH��������+��
�b�;R�����
xv�$"�+���u��
zJ�]1:TP(�l;��Y���-�����N]���v�P�������[Dq��?�s�0�cd1�Z�s�O���x�z1���T2#���qUQ��T�:��y�9&��X��?��#g���%t�ld�UV���i�iPS�	�l�Q�k��f�d�%�,l�L��m�N38�`X������lm�%.M�h��N%�e�!��!���
;���Z���>j�\oJ�X-h�c�p{��,���g��[rbwO���<�xj��M �r;���h��Z!�����I�C�s���h�BU�BT#u��[�7���4��["~�b4���x8����Y���>��d�p�"�3o�|���.O.z��|zCqIF����r���]�M{	�m���]��7_���+��*f�I�i�(���2)���c�Sfwql�(-�L�4X�B���z���i���������
��m;�d���I ����	�\
;4zV�n���3���{@���K�W�0ru��2�����V�}��&���Z��-�63
58
���X�^��O���r�1��2W�k����Eq�yQl�D�Eq�yQlwEl_G;��En����L��o-���.N�� ������;���<�_�oM���\m�N
,}�� 3E)o��t_�<RP������v�.	w���[���c���[�;!�
�%�-�!/�W)-hH~�;����B�F?#0�hd�[�������i��h���U`����k61�#Z�B:�`g��G/�UgRk;��:��7�����a�me��k��V���8�8F�\j�_z�[W����w�q4�_V,�F��J\�(x�](��Gl/���	����i�d}0�?�$���`�#�([���G���.b3���������5H)�6P
r�b
=c.����iT[���)��#�,}
XX���:�s:�K�{V�U���st�Ai����6�i �=#��O\~��g��,��U���d�iSh���#1Yj����G��V[�������Eo)�k	����`���D���dvn����:TT*���1;�v��tv��`)m���T�=WT��:ddA���pd�*���[�Sg�H�-I�����D&��4�T�*�=X@x��I� 8��xa��8O������U���{-�2������>)k��ll���:;�l��e�g��r��k���e�����������t_�R2~�M�l1-�I[L���b2��\L��c:�����G����)���D~���6���0����f���
���0���X�]��M�J�9�v�q;m=�<��9�&��A.O��H��9,T"�,������CvYRK��\���D��5���BLqy
i��R�@@2��7�� 7�� ?��`KR���y-;����[v�]"Tw�*#v�����U4�V���]���b�=�Q����4�t��&�mt�!�C'�v��W��$�|�R2%Xs���la����'o7�iX�l�����vB`��@'�D�J��K����A���������O�/���j��4!l���	�x:�����>�>[��	�2�����Q-���6�wO�%	J�8�@
����Nv��V��](�2��wxP�f�^�F^�Rq�p��m'����\��]&�!�����x��|�fA��z������S��pB�����WJ�5@4&T��d�,�H�}l@��^�z�C%���n'Nmj^���w}����n�����D\�1�
ejlFf��:�z�Qn<H����s�s����Bi��X+0w	#p�/�U�h����m����3�.v�E��*�a%��O�"w:><|^�/���8k�a�P9��eM-�+�8��
U�2,�@;��[ko����h�x��{�Wku;��Z�O_�f%:�����7�.����PM��Pzx�:"��cur~���@a~rv��]�/N��B�"��Yl�QC��L/yC�� o{y_���Vklc�k��*{�9d���{�f�O���3�2��[����#(.?"l?V��
'������jR��fv�[(�;#'R�sE���
uf+��?t4;���9���*!������Q��@]F@�o��O0�Z��~����2�p�'����TD�n���(��(��TDL��8t����ws#�EWg��(�M��]��!��m��N-��Um}L�4��h�W*����z�����n����������0�,Z]�IQw���	S
�Pk�n��;[��bGs�0r��,t������A�G�>��-�����q=��y�Z�nk�;�hH���dE�e�
�p�-#7iUt��'�"��P�Q,"_�����H�h���V��}u4��+�
���"]mZ�����	1w��9x7=X��f<�����S�O����	���/�.����?���M�G&9UH�B)��������\B��]����D�q���pG�t�n���������j�����gU
5�o\���@u/���8���_�oZ/�4�Z�p�#�V�z�KCG��W����G���}�K��I��O��[�w���3p�����q=Z�*�v���o���jU�_9�V�� ��W�-|���~��f����T���|s�m����0��j��H�L�g%���?p)�����9HB��.��W�~pL�������ik����3�C�]U��o����cu���
�_0���q�9�#s�Zk�r&�����S�\Tj�_s�]�����f���H���9�$JA��wVB��$��k�t^9dCN)��������/N������0q�	���B�����"�����`X���W��k�,4n��'r0�jH��_"I���#|k��'d�mM�?��"iM'(���lzg���rBn��'�
����&� HE*F������������23kR���Dua�A�Q��cM��#[���]�S&�J�\y�)����aT$���AY�����1�C���M����>c����qr���%�
Lo��M(����7����$4*rX�$���{`��K<]�5uS�|�
�l
�yZ}�M?|����b�~����U���5����_�����W��Y���W��@1��5��m�*�ne�������;���2M��?�2K���	�����0/�����'y��W��I�J������r�o��������Cf��D��bt/9�g���P
��CGx��8������D	����O�5uD?�*���?���J�����z�������A�xP?��}X#��<-'�!;o+��ZS�{m�~������nx�Dd?\�d���?�q����=�wox�:��NT������|���D')�������.dOz�:�x�q�)����5:'�c����}�(r;�*���.>�&y������������M|�R��s�"�:�j��"x�&.�Q�.�X�7=��D�7��`�Q6M���Va�D�9��
GE��d0��r�p+.x���D��(�1��hO*��������YU�"�g�YP�_
��%la
�`�v�������%�NX_���h�g���J�kD�f|����RS���V�C����<:����u2�!�.
��`�m���;#���j I�������u�s���E���G��SO�=���(
��8�2HPY���P�#D��5i7G�`�%�K��;<�$ ���~h�/����ED��U�m���:'$����8'',4���&7��%C��(��3�8�Q��4Y]��0yL07-D��(Q:�����?���
p9�,R|��	vBS���z�P�	����$���de�
���LW*d�x�R�z<e�jk������L�<	��������@�������y����X�;FiV��a���X��8%s�0���Cx��c��>��
������vb*%�3%�D��
@0Q����B'�?�/��5[����w'��}��I ��"5p��J�P��5b��QE�Io��`]��	�4Q6@5�a��������7M����W)n�u���f�:V;�����9,��%FP������S�X�WQ�����L���}^�%�E���I��3��Ab�"F�T
=���,��t&��uN�rm�L��/��O�D�����E;X����=2���7����=s�LB^x������cJ|?L�����A�qF�� �����N�so<�6��+��VB��h����#@{���
-����F.
D���<�,��2RX����� �������)�O�v����jl[L�@f�����HEs����~I����	������_@�2$@y���F�7�Y�=j�7��m+��d������f��Y�/i� ������9A��VK�d����h�02<��T�~��A$�T��GI���3
�������|���-XwN��Qu2�������f��J6����������z�	�L6�s���{:����X�6�76Q����&��������x�����������V;���T���tp[��a��:
�#;|Z&~��7��F_B�D02G��A*|��za���~@�x*�o��A��a��t�%���78�� ����/������Z���$��
��%�j��F����'{�10�V���[8�<9��1
�?����p��T�K�!�	�w���6lfrk<�;�����P���x���_�v��Xk�R����
2pG+����Mw��@��y�T�|"��H���${��]S�1A���Y��.�b�� V"����0 P-+��m.����������^ef�W q�+�:��{x7��C}�3��"�68l�ca7#4>�C�u��.9������J	oz���E
BO|&)����vZ��I��V�,�#+��\qwE
$:����C�����*� ����mI�j�2�d���b5Z��1-M6����p���u���������qIh��������%J���%�R	'T(9b�v��<�����:<�|%�Q���'�#IJ�zB�!W_�b�^��	a�������*Y
"#P�(���|�N������@F��+�N��nz��X�\ ����S�6���������2����?�cO��r�`Q����/n(#���B�A��(G��D�y���r���2|/F�Ah�6e,
�:�p~�n+�N1�%�T"_H�$��Np��p]���h$PK��oB)��H]a
q�SC�[�0�"��� EK���c{�p:eV�
�Tf��+S�N���9��7X
��?`�7��=<�6�)f�+c<���D�P!�\R��������@���,)��VUZ�~i���_���S2E2^v`���m�6/��w�����|"E������
n��(��Tz��RJx���������a��q����������*�\�~M��Y������n��|��u�YT�wA��Oh`Q!�L����D����v�SI
��:O�t	���v�8L��,x�hyk�J�Pb�0�[�$>?ysb���D@��P��u�q���'�I�6��4r��L�-�\�_���~c3�)Z��r�����<���i�L5�5W��C{�y���<��������>��h���6K_G�����%u�vEkk�\�a�?g15`z��|�	���NZ��|J�@�+�z$�[/���7w�B���0�����L|�+��Q�SX�Z�R"d�i7�n��}u`P`u���
����S��|C��R�A�6�l�$�b�;N��h:j�A�����2W
d�c�c����~:=���g0�l�1���(,K��,w�a�����XX�W+�uU�`E8�����+S��Z/0�������WG�@��zF(4��t �"9��rY5��>�y���dp+��c?=6�*��/J���	DqQ���
�U!��x�����0�P��0�������aJ������Cj�Sw0��Uk�m'�v�W/cK_�DHX�aB���0!>C�".��4D����������M�rUt��j�&�w�����������l����5��� a���e*�6�52��ha5!��	�!4��Z�������>�7�
���B�������&z�$E���DYR��l����f^&���L�N�5��+o�u[�T����^\����M���������d�J.�	+s�1q����V?�����0�1����q�t���2��<]�w��C5(����}3�{cB/\xP�!���I�)	�e�N"q�C?E$��4�	&3���VJV)����Iu������3y�����>]\p-zZ�TU����������t���`��������b�x���3&��������$?C�_rQ��r9���l�%Z����jw��!�%�m�im�q2��8�"�H��f��	���W�k�6}�z\&/gM���.�dB�F�}��}�i+/]�uj57���J�>y��(����R#}�^N��|Eh�)�H���>�f3�2��xK��CF��a����'�iY�C*����g�,Ze1�����5��U�te^����s���R"� ,l8j11�3i�����B3/���'�N��$�����W/*k��$
E����jp�-8*�I�������#�;xF���q�)�j���G33��*�e�1���%�|�J����G�K]
�E�=(o4�lY��mnTI���[`f11
�'�DkN�Kt�N+����%_��
��:=���M�1k���[��F���
���/��� ���v�X�(M�zm����5g�~�2�2b�)�A���J!�yT'���/+�M���,����Pm�I����u���/e���{1����� &k�����������@�3\y>�I?�����ET�"l���7*�:�;8�����F�
K���?���"��`��@��N)!?�s;P\��% ������_���@w��g{Y��<�v�+�����}L����GD���c.�����:�U��#��X�	:�j�1%���	ch�
��
�2�A����D���a����������s5C}�F��anH��[�����^���[����6&�M&�g��!�-�M�������6�C������mVb��sz�^D�h��	`/��$!L��?QX���9���O��
^'��C��:|���x�@��"qKD������C�e7%�!�Orj�]y������>2}Oz$�Uh(���:|/�I�
F����Ba����ua��� �|��l����L��}&���M���u���x��5r21�F-s���
72e��:��/�9h��%���p.G���)���2}b����_����1�F�����B��w��~�1Q4����Ot�2��bw�����x+2���AG����S�0S�b��H�s�}��E:M�N���\��Q��?���>A��;�!���1��|�����zMv��h�w�p�]��K�G�������+�������O]��2D{��]��	�zl�>Oh�}v�3a�p'��i6��q�c��O]��}�1�b��>8�G �%����D�s�#R,e�������7I}ZM�3)�
�����������&����_��"�����c42/����,5��pv��E�����W� �~P�� �B.�nW?^�`���*���Wq!�"��k���i�
��d�L�f&`0��|�����*��bB���"Ju�n�Ol�@�FY�iE��7x��/2W@��%�5���4��__G�kasY����7=:HY�3��2*$���t$�k�!=�M�{�`�}��$���p�F��������i�$��r�����Y���o�fHw���.�=�aQBv�\o9�q�^PZ�i��4����g(d;�>��G��7nr$�l}I�����*�z���;��?�]�_����L����h��U���
�s����&*�k1#��������~��v����C���]�e��1fz�)^�Ch���H-�&����#���<�%�6��Qm�[�.���,O��5�b7��������Y(��D�/�n��<��>q�����a�������rCV�By^}��*��n4���+�����1�r�M[
�A2tNC����[�Xu(����b�!1���]�k)��0�R�I����}_X�G�����cE������.W�,O��#�������������7z&����O��������#�����'�1�*�!�-�k:F�U�M��}���l9����.�Ik��R��������Z�+����b��ZgD)�M+��BcI���f	����mV�������o�9�O�N�3�|?���lV�z�V�b�[:�r��I��"�$���;�EU/�a���*l>z�u9�6w�B���V������)9w�J�4����_���S������|� �)`�����rp�a��TVT�?�#���?�x]�����{C��0���U2G
���C2U���Q���T��FMD��^ �F��N�� ���*�K��-	�gR�0� ����Yf�H��G��H��H
����@.�$���J7"y�e>G�����P#��=���"����I(1�:!�g-F�+Z�xH:��S�t%����D�[=~�����������r��IU�*e����$[�q�b';�G2a���w�0����[��8�@N�1i�L~�����Z�7�����4�JGRQE2��')+F&>�%ez�w�g=���z"q�J}��$��n3�U�i��=�2WN�Qe�F�q� |��L��\�U��J�(O8{�C|����>.�����5��@CTR���d������],�]j����iA�4�UF�i4i�|��72����}
����GC|;�����.���Y"c���V[���vj�M?yi�,���(G�!9������w2#(�$���F�y��5#��7��b�C�>��nb�a-|��S81���fL��x��}��6���������#�w�a6o|�'��`��v�Mtb&}���?X�����\$������*V����Jq.�5�p���Q[�-���om3�HB39�"a�L�)
�)�1^pL���uu�.�z1�����	�L�8��"��l���6��F,�QG�[�NS�SSo�l�8��s����e����>W����5Ma�x�������SY� w3��1��()K�o2Rn��O��%�s�G������*t�������
9J�������t1!���~���P	��i������g:�b�T��O�p�3��I��9K�d^�x�=<�G�+����?��'n+��k
���T*�Q��7�|�YVi�YV)��ShX��J���A:r��lK����0�����@�4�8�{�����n�\
��dCu�l�U�Uq�^mt��M�b�W�����%u:��0�[���4��L�O�i�O1�����kx9?������Q� h�	�q�e<����7���1I&�1���U ��X�a-M��N7w.�1��7�����P6u��N�]Y��<Gh��U�`W�����[e��������8t�Hv�r=�/�S���FU_g������������Cqj�.}�Q:=���o[�RK ��{�v��������
l�Q�����.�0�S`���F����%�����h��7�o�Y���bu�G�9q��J����mo[����h�xa�UV��
�\�%T��XG������5�!�:2�7:
����*aNh�x�?7�M�aikB#7���t�H�l?/�t�	g�
DqVr�.��	��|�4�;r��+�a���c���_��QC�4a���fm ����B@�#�`
z�UtAd(D�H�A�#��DF�w����N����g���|i���(g-�	�d
�;��d\�G�Fk��RoCU�`�0���~A�Mj��U��	��R/G�GIm�T+�(����.�;���!�o�'({w�-��H�1��94Zo������]�y����q�K���5����H��(�vw��PS���'������7���J�i�e,~�A�V��C�
siU��xuHm�.#p;3`��_�(���lG�i�����`�0�������%�3X��S�ou��Y�9�����
��r<3�>�3������n����+��������8��
��~7��:2�~�7��yP4��a�J8���r���MJ]�|�zU4�[k���������p�+�O"��������&7s�����X��y�o|D�
�h�y[f/96��qlX�.r�Bp��L�;,2�
�I�>����Y�Q����}������9��6GQ��P���.��������-�s"=�0?az���~
mA)a�.5A{�@{�T�.�)���� �#����d;���i����1S�j
s���q�#�i��q]����%z_���V�k�Vw�Bb���t��������4�����2%)hw�%��Z�i�jU-^p�]�D5��Z3�@m�y�%1f��$�B�:
�(+ ���@6�U\.+�qe���vJ��R�{R=��L�=)E��o�ae�b��_������;�o[�M����e���	����:	���u�7�g]GW����]����Kuqr{�}�L�s�p�{���q��:g) �8�����^Dw���q;@��H��^�u�G91cBt@��%�W�<�zg�5��;������H�&��vj�m�i8�������}>f���>F;��B?#�$�!�9Tfh����{���������ci��i�B�������f���M��ct��-ox��s
���&T�� Or>��E/�+k��<�D�
o�j���m ��D��
JB<��h�7�����[#X���)]�i��C%lXdw��3�����2�L�>�"�xbR�.xRp,_3!���-���"I�&��i��l
���B��d��M=��y�C)fX�R$���N��$n#������}f��u�U���x�|A����H:=S�W��T��������x���q�!0�n�0���%1\����4,��7�y-#��d1e&��^�����#��K<]jv��K���z����N�O��:��Z%|�z���Cn�h��N��7b���Q�
��������b��)�N�6�h�"4_E~��+���7�e��l��s��:>�S�}o�[:����QV�v�ek�X�G'�P�}2I
hk�z)������d���jt.R�
��,2~���������$�-J.3v��?�/�T��lq��#��_�Uv��Y��I\#1��w���.`�_D��e�!���n�K+{:Lp#�=�0���9z���y�b�=���W�2.BO=�g��-���|��M��\ 2��_xn��|2����Z�bMe|�MHzU"<�zc�.�1e�6��^�Z�rM�.�����`b	��_�m��� �I�s��o6!syJzH8��� �R�����rv����I?��qw��d���U���	Xk�7������7C|cu#�?I<)t��iCs/��
������=]�S8P,f���/�K��������h�$w���-����Y����
2�+�PE�
�p�"�1����`7�g�r������Z�:���������g����p��F���u�oL����q��jX�������Zj&~����q�����s��M��Dg!>���p�!���W����-��w�7�(��S�q�~�]lm����L��R��Zt;,��G_��k�)L��=O�1���TO������a��Mg|����_�Wm���$�J/����E����k�CI��x�8����v�����=��/=�<�V���YJ��*��������
I�
�Q<�5y���0c/�"����<#���nA�Y��Dx����]�V;�Bv�'���T����UFv?���Yt;_Dk�>F�j���z���LX�J)
��x��V��X�i;,�����3g0e� X��S��!�6��B����w����@��R�\��F�`��NFUO�>�B�%�q[���aw�������J���n����ru�5�j��jvF�F7�F9���mrNA�'�������I�n:�-���X�(��
i�c�����%nE���df��1�J�fZ�>�s�g��d}+���j�!����������.�4:�����2h�K2�AM����_�w>	]�a�QpS���3�!����S�����*����d��A��%%j
����~�fk�}�.dU���s����W**�<=!5|G������X";���1��{�B:�����N�R�MZNw�6�����*H������P�D�j���o�`3��m��@�x��t��o�������C�/�����/��i3����~)q���R��K]I��y��'z��d��jD����f��N�����vE�wG�
�Mr����*��n+��U+��F�5��`4a����|~h�N0��*C���J/�$75,���l�������bD�tK�B<3r/�
l��v	�kY��|>]a��IBv3
�.��D?�2�;Qv:>�:�k09���!�i.I#�G�I�`P(���Y)�����^�����d�4���^�����,��,�#�x��
A��1	�A��9�TX���$1F�FQ&7125����!~7
^2���Tt�)��u��!'��0h���C�z�s���4�Z@)������i�aJm�m���S���`?��Q�Q�C��t��rEH���kh������w�'���xJ-�o���)5����[���?/|E�d�Q��S1�h�[�sZ!�a��� A�m��x�V��q.>S��,wzuq������d�e��+N��^8��'���_��S6g�;��R_�<��dH����j�aHu�g!�����K����������K�Q" �����},��ll���+-�:f�t7�p����R�z-��#a���������b5No �5b����!�+��a{����������^��������7�3C�����l�>d��J`I���������>��������#Bi����%��#\~{6�������������&�&������Y�������7�,����;���x�����F��=�5�B6Z�����|nTP��M/��9��:�CjO�������vl�a�Q���x	�r�����cv�u&y��B+�-�0�vO7�(��6�0I�����b��h�|.�8��F��u[����'�wf�5����>�K�M�7��b�c���a-����g�{J[�R��J���O�����d2MA�C�d3����[I=x.m�'��zN�$1��x1n�bm�����IB���%�5��Q�d��I�s>x/f3��<��Q��W���u��X��C���1�k����jr���R�����p?��q%��~�!�C��({�2��`�_���!+�t���]��/��1,�'����I���n��@hK�QdH
��d������h�j���|i�v�z�"��9"\x05��� ����>��S�9�.������9�=bg���DbO��l���[���U�&�z������h����Q�������.J�1��
�tb"a�hh�����{G����/�����xZr��Ng��;�J�u����p����s����j6	
��
zm��<\k8����f�K0�P��0^�{	_���=��c�FY�o�h2�[����b��"N���t�+2�������;���!���*�!�� e&����3�,�w�u7�E�j�j`����.�y����x��T�a�����g3���tf��OJ�kHr��MT���.~
���_��t�����=��G���6J����5��n.�.N�nH�T1*�����A���}���9�#����U}B�?����#���O{�������8���{n�R�tZU�]���T�	O�]��E?��(��59�����!�h	nuY����7l�b�����PG�&�/S�nyY��o�����f�]��n`Sv�Y����=�����rZ��n�~�����oANC�
#V�$�`�G�,��M�W�����W:&Dq��;��,����l��������Wku;�����f���[o�t�}�VT���|09e�-�;��������=y&6U3�j4��:9;S���'�Q��Et�j��/���@�p�B��Y��a]��[WCGQ��m5��un�
�j$��YI�n��~�����_��������~�����F���tO��c������No�4t��<�o��.(�X����
	�����j����*8�)��������684����ws�n�?��)��>w���)�l�R�V��9�W��oNN����|���ptm�1��r��M����D�h;2��*`@�M�dXa��{{���s�?���,���S�z���2V_r���^�X�1�'����E�e���K�?q�����!�]����#�&��c0S�/���A_�|t`��e�����e����G��L���q�/�"�
�����2�2#e���u��
UJL�c$���.����wvr
j�L��m�C��>��2H��L���a���[\��G���^��[c�V��w�0w{��������k��	a��Qc����Q,������p�=l�p6��J�E�!G�fP�^J�q�kv��\N�(�S��_Z�:�\l�c�P��p������@��
F�A��}�Au	_��k�6�Yv��9�{F&~\m{���R�u��Q�������e�s�c[�eX�5:�t���`<�
X��y:���V���F�z�x�s/����0������s�c��?-KpG������:yw���K�vh���n�#�AL�2�D�g��H�L�A��So���e�V�1�`(�bWE�5m�g�d<&��;K�
a��Q�i��.�}���kiLW�
��|�S��]xloj��#�l0��D��q�5�����R���t��X>y�84;17���A�x��:�G)�h���i��6y��v�����{�iD~��r?���x��B9@��h)A
��0r��������K�/�/�J~�z�����r��lje�����C�������I6(�,>�k<(e=(�Z�`���c��2�����OHv*k���B�`�DD�<��|Y���0:5"=��Vyg�Kjt|��t�4f�������3��Gd���m��e����=����M������7Vx���V�W�{?������B�J>�?����{W��8�U�e��wT��U��o������������i���j���5��o�N�+U�����Y����W���r���7��=��w;n�S��{'g%���?�.*�����9H��������zu��t^���{u1
A��a�qz���
<��=���X����TN[9���s��T�Zk�r�]����v���J��k��6�q*���bu���^�w8�MW�%�M_Q{A�P�T�`�N����:T2v��'�����r�[HQ��K}�[QE�7w���.�P�\}]�{���b�U�]��\�!�^�d��F
,_������
l��,��j�pc����7I
`)
��u�p���T�0f��r�=�Uhx�FR��(�sv��SL����P�?�u��j�/9�����A�����k���R��+�C���yI�p�)��49������7&�{�}�x�D�E������
����T���fSR���%�� |��m=�Kt�5uS��_(�gS0����{o���T�O���
�
�`��i����j��_��\�/|5k��Tk����6Z�v������?���R���2��?�@xy�������h�'
�+�	4��(]rl&yV��������T����'��!����r�U� ��� :���C����E*���	 �������k�������0�������5U�������n��E����N�����s�
'�
��a�3��{q6�p�m�NR�s�
s�m��V�L�a8�����������v�U���H��gk�gS����m���Je��8y��V����X�7=��+�7��`�`~Y��K���``�0c�y��!/0}P.��ImP|�j��xX�9�T:Ng4no �PU:ejCa�b�j���6�L�Is7��Pl
�.iu�����GOE~��(�d������������w����������>��C,��
^���qxT��cw��\.��x�CJ1g�T~EIp��t��$
�*
�->L��~��o)�N�+v�`���$�0ZR��q��sx�?��
���LG�^�����`M�h����#6����{�	,�PR�F������z���7E�.$~E��Q��F�{���E;��oI��b�+����DG1HQ����a��6WF+l���`�U����(	!6�TEH�`�	�
�	�)����|�	�6�B]t��sz�\3�X�bS���w�Iq��*�c�a1�{����4�I�'<y���z��u������T�r���	><��������1J��l�]�,�c}t�in%��q���D'���8lgu��X�j�L�.��I
g�8�Mf�q�`������~R�!l(-��w'��}���I�!�sG}�L7������l���0ez���Z�
1RU����q��
�X��Psf�9<9:��Xq��~��Yx�����Q%��"�0�//
C��V"�����4�G�a���njK�*�FT�d'��n$E/#D@k���J$)9ru7HL\�J(BT
=��R�,��t&'���J�(�k�^r`��G���p\�a6���A�(6����=s� 4T��
�A�r
�]�a�L������&��d�� �L�L�t>��Sh#������GF&z,����*������U��V4�8�#��[P���d��7��I<RX��W�`�yS�v�J����@�sSa����"b���u}d
��}�A�������T�����2���](!�p�z0��nX���G���h#���G���"�m�����S��1�����!	B{y�(�FW{A���6ug�G`�[����@��kI6��������{��0�@��y*��/�'�ug��L<�^��l��d�-k�Q�8�s5��)��L6�s�r�{:���X�9|:��^b0��������$�O^��|y��B�\��-;�`��W����Y_F�V��{7�u�����
YEh��}C�o���o�8T�r�������D���q��)�0TX8�(�m���o�����bl���	�M���
Nnnz���q�_~GA�8��D��^�N[2���7����d������A����Y��o��P�
������ Oz��t5w�x�z��
��)�p!4K5������U	���b�%�
E9���I9���s���������#w�Jr
e��X��K������~||�H�G������C7�����k6���N����u����]-	oq���P���0P�3��e��������39~��@���W q���0�3��a�!���(!�������)���JT3���h�y�[F:%����BP���8&tzu��������������
s����i6WD���wPv4�����P��`����5��
�1���u|����,I�\�e)���d�l���FL&9���TB��Eo�L�C��������lk�tO�_#.Qh;�J��p~�;����q$���4@�D��K:]"
��AD�l>K{}���A����_�L��S��/�p�kG%Vi�
�V:��9$SPb��;
2���d�\����M���j*`�W�"F7VT��������0������{"?5���foI����lN�G9�$���&gs����4� �;�^@�,�`�8"��c�����)��K	�����QV�&�S���������<RWX���x����l�"��� ��U?�3��>
� U(�~[��\!Q��N��8�r���=8\Z����i@�tZ``?j]I�����S&/��-=����*i�P��o��X����mJ0/;0Bj���w�E����Z��yn�"�i������f�p��*Ha��)%(@	[������W�YL��gp��5u�f�����#�����_�E�m$y��x2�?��}����UO%5�.�<��%la�Z��m1&���*{yk�J�Pb�`Z�K���7'fJ�M�o���|(�������?q�-@�L�&��FN����e��/a�l?����������>O+�m�<S�k��������������S��il��jEm��fu�������\�_R�nW����U��PzXL
�Q�q���E���NZ�oA��t���#:l�0�wW<�!tq�C
xA�*PW���Y���ZI5�Z�V�Ts��X/oZ��.8�1����DD�`Wn?�������R�%�M����,WA��*V1��4p	_f�):b]����j7���\��[Ro��}����4����y�z��v!3���-:��@3B���Px�g�vwJ���c>ZS}e
%������=����oyu���>�743��!�C�#�e�@w#�[����:v��
�L������K�Z��v���@f��J\���t� ��t��b�t?����i��D��Kll��V�F1�-�U��@�e��B�h
�U���w0�f�c�?A�z�'H�v_C8�	�?���%w�ig�n������/G����3p~����B=� ��1��!�.U5$�(�A�������M��G|�,���IP��o���n�������8�$f�[.����2�p��Y�@�01q��v�x=��o8���7���~�y��h�7JR���/J����l���O����Qc�l�����]wq@f%�wW�btw�lruS�$L�<���D�n��720`!��h�J��q�bX��������,��4H�`���5"-���d���V���(�Y6jPp�7�+�h
0�+�b����B������o��q�C?]��a|&3���VJV�=}��<P�����<��{�Yd�..��}hP��b��x=�_P8K,8�)=3���3�)C��N�@=�gL�	��/X���C�3���)���3�l@uOE67~-������
��6�T�j#1�*H/�TZt��fnR����a1j
2y?�D����`�M�k��}@���y�C�d/r������c�/i�N�MIy�Z'�G��2\���/5����4\�W�q�A�c��o��[����d��N�4h�+s����Cj��:H2Wl�W@�
%o����@�v;�oW�dN�����s���R"� ���+j��6k�����v�?wJ�&�x(��6�2t�S�S("0�w�V�+o��1BB6/l:=�9����gdjKg������~43C��������6���%L�J�������9E�����5S��mnTI���[`f1����S;-���5��9].!D����2C�;%�"Z�d����;�s������0\���tc��3�O��v���i�������� �����I�q���5�_���
fL����rt����/#��*�<=����Kl�o5?���l��|�i��^�1V*�f�����HTC���)�ER~�~(s�����0�����r.*����[�&��q�����)2#��O�P��O�CYn��U8�G�i����%�]����e��|Pu�V�tHn��lmt�����"9B8�xA	���'������[��7>���2.�3�5�R�=�;P�$����Fpa�sO���X���Rc�������S�+"4o�=q�d����3<��.I���Hf
�p{}ry���"Bo�:n7���Z2�Kf96i��Bx��i}�����I�p���8�DJ���6��x8H���6�Dpg.'`��2m�:@�_�6�Hh}��0o�QW��t:����4�����C{���}��`��a!� P���HDzWAT�&��������9ysr��6�i0y+�������<F:���������%'m�\��;����s8&�*!y����k��
Uz~�:{G6�qr+�v�>��;K��;��A���9�qG�����{�B��Z�����q��\s��fZ����0-���,\�"�#��e���I���<���c��&o�	^u��P�4��Z16�;���x(�&����9h������b����%n�e��R���p5�!��93�t����53]-OM�)S�����V�iO)�������so2Y
��;�}�,}����<����:U`4�;t��xK�7`���l�S��4����x�c����X�/l�({�'4�>�%y9����O�y��<(1	4�
A�l��1N�+p{r
�Ef]���	F�X���?W�o���:�gR�v?!��v�'"S���I����`�������wo9>�/�k�M+��=9��B��w���6	NN��Z|pyu;�����>�{��a���/ �����;0�V�@�O���s�L�\r���4D	"y�����Q���/"�W����i@�7+���A��������|�������/����?�t�Q��uTP�!6��.�?sC�WX�k�l�	�?�Y*��4X�K
��=���>|V���A���Z���c�����i�$�[�f��,�R�wO3�;��V@��������s6mq����
�a>:��l����h���-�D��/�R�=�[%R���G!�g�	�k5~��i9l��jB���+����#��`<���������o��O�����������w�{�5���}Jd��@^�j�7	=O��Gz}�[����XX���R��5T���;<����v���s�Sm�)����������;;���%oX�;x�lt�W�UD�������U*]w2v��~-��
n-�a!����$r�����8�7��c��
�&���m�OE��ktV�:�UJ��S4��f]�e�����F��ou�"��6���������.��,�Q�c$������%�����7z&����O�x������m�y�&�v%e��R�6��w~��������2�K���Z���r��UD���;
����j��wvX�v%��]�8�J���������`�n�����xi�����ur�3�`��:�S@'T�\'������f�&��z�U� ����f��@����nUb��R�E!��C�o�GO�J��R���v��O#��{+�s5t���~�V��I�d�������U�6@nnc�z>XZ�WAhK^�nrp���iTVT�?�#���r�������9����!��-�7:Y6G)F4�y��L�h=fc��&Ul�������{���_�G��4�^�������(g���Pr�[u�o���2�\$�����#�u$o�$����L~��K�����$3��xakM6��$�xL}E�. ���SD:�.o�h��t(�~wYj���H,���'k-���jK�]J��*vR\��D�p}�n�Vi\�X��f�_�Z�w�0��U�[��8�xN)5i�P�T�N>�n����5��Z�T*�Hf:�\eE�����L����gY�XO$c��O�4�w��m��j���}�d����h�i������*�f��+�o/
�S�|�M������q�E�%��12�����W�&C(���l�bG�R�ofBX�kN�YoqlR�Q���od`��!��dy�7�-n��v���?������D���9�V����Z�6������62����
�k�:j��x`:&S���Fy���6#2�7 �P��T���N�&jV��'�K;�3A�a�G�h�7��W�jQ_m^���Y:�{��k�FQ`��C=��Mtb&�>���t��H���x�
T����1	�2\�k�����A���,[L'���f��|r�"E��U�SVT�c<�$��Uf���]��b���'��8��+�q����[�&�6.l�N�zEY�nE;MQOM��k�]�s��N
�]�4�6��������>l
{�C����t8��������INFIY��}��r���x��(�?�`&����vF7U�;��-k�������-0u��s=]����F�����J-]5Hc�dd�?���R��"����MO�?Zj%�z�K��x��[\9�(�\oZ��p[	Mf=���N����;5wC�VViWZV)&��]���(B�
�s��	"L���}�2��YR��q�<����X[J||�����}�Y�%�c��:~�������@	����e��F��y�X����4�v��n��c#b9���R��dl:��r��ry}�o�A!S*72��Q��/
_eu����Fn�����C���5H#�:�������8dw'�w������HN��3��Q�t�p��,K�P��Ke�N�/��������+������*O.�)�5�b�-�@�����;L���I�}��p��d	nFc"�5Z$��
��S6��XAQ����i��h���:���TE��-#^��;�u�__K'�V���^[n�X��e�3��������lJ=/�d��3�
:��L'T�v������R��c����;B����BQ�a���`�\�������+z�#�x�&�{z�Ut�d�	�G+�'"��D~���)��8��JG�����fl������#!i�)�������+���Uouw��tU$3]��Lk-2���(�"�V�S$���G�GIm�T+J%��g����)�i��	������u�[�\��c��* %=0����������Aw�� �k"8v-�|C6���������T�GO�-P����7�r_$f�^�(�z���}�A�V�N&������_��c[`[��0X�/i�K-J>�Zu2ZL#�<ZR:���y�+�4��}��n�9��~ZY������u�N�[j|-����E����m�V�v���Z+�[{i������B��f�����+��LnUq���f�7?t��]�U��o��W^�����%*B$�Y�L��?�8j*:���������c���	2������ ����^2cx������r�Bp��l};,2�
�G�>�����������}������������(�m((IW������y�����8QH��
�	�~u?��6T��>GW`�=^���!cy�@��C��KpM�r��4|
����I5����<����P��k���.K�h��/F{�����C+�/f!�	�~:�����X&G���H��2%)�8���N�S��jU-^p�]�����Z3��l���%�I��$�_����(+����@6�U\.+��	e���vJQ�R�{R=��L�=)E��o�ae�b��_������;�o[�M����e���	�����x,����u�7�g��F�Lx�,R����Kuqr{�}�L�s�p�{�����:g) �8�����^D���cq;@���\�^�u�1cB.@��%�W�<�4gn1��;������H�����
LF�U�-��P��X���1����1��H����Y�C�VC]���Y��w�+�_^�~�h�&u�*�Z���j�7 *��E��������?���jB��gtr;�#��!��������Kd����6����Q6�Z��#�M[�v{�*Y�h�5P�������Hg�8d���U@f'm?8���*�C��$^ ����'��5S��4�,��*�����!%Y��aT"z���l<���<
v���*\���_���r�M�����0��n��J���[5�=�Q<������3��_O�m��2l�&[��a�p��[! B�+�bI���d�.�p��r^��%��AL�I��'��J��H�E�]�B�������������3*���������C�^m6���9Xf��-������q����+�@2��Xu�������9���>W��_��J}r?JC��!����Bq��������C��ja��r���&I��
�M�b}�)P^@���)����X�2x2�u;�m�e����H�6(���6�����'tW�(u���l�����S����U����iW�1�g�Gf*��T��|�����J|��.%z�	���Et��R�)�-Hv+�2Zb���9��M���\&�]���=��}v�������7��r��P@7w����W���2��0u�|4��!6q�N������0b���k�B^.3�U�^���L,���k����5�|"�s��&d&O������:5IuL��������=��]��+��%K��d�(�J�Jc:5�g���I�
�����q��Yg
��h++s��^h��tE��uL1#��&���l/���F�
�j�B�5��X2���
��5a��=�����Tt����,b����6R}�,'�)h�������cq,���N�|�H��
�iTjT_G��4f�*2��V-r [��z�c8h��#��!w-���c��7u�����@����R��^]��� w����������t���Z`k�3�e�DW)�E��aI?��%]�Na
��!��8��~�z2
�d�J&�b�h:���DV�}i��,��Tz�=�G$,���W��I�������09�=��`o�,�K�3�=o�x�����J������p���'���)Exe�>������H,~����b!�Q��E����vU���Av�'{��{S!���TFv?k��~Yd;���5b��Nl[}��w&,\���?Y<f|+SX,��uD�
`�u�`�.A�`3�(<��l&'��`y���~wem�|��p������������}���'j���A���8�>�q�),����H�%g�{��W��\�R�7F��d�dN5�;���$sJ�%���
F��r�&&'*�����(��
���&�'�������\����K&��*3��t*�!`���j1�F�C������GT=��!�?
��<��8��{�/���/I���H�Wo~��t�M��mL\������7oJo4��G��������i��/ +Q{(8��5\���t!�
(h�+?x�����)L��I�y�.8�v�wTe�"�I���v]]��n�1�TZ���io*��f�.�
�@�X6Z,VY$�0�" ��5��p�[.10^j6����'(b��P��������)j���f��Y��dg~n?���i1�|�V�Jj��CD7�aO�Ut�#���l*C�s���xg���*2��;��n��wE��TI��[���Z��7j��?�
k����$+u��VT�n�|z&9�a�H����sV.������#Z�[r*����\!$�Ph����o�������
�M��iHu! !:��^��o��]E�q*���Ms*98<�N�sC�hE��J���>x���+&M�U&��&d�M��gy�/`���%$�D�_*���`AL�-0�=���
�d�1�3�B����i$o�q�Qp�����u���O���Cu�
�K���A�e�!�����Ng�|���H��\���=( Sj#�����G��Xc�����/��BY�������r?�/����O.o����Z��0��Sje�363*'^�������
�b ������B��j��+@������.�2e��t�\'Y������{�;���4�W�h��p��NN���8���l�w�E��y����"�%�$����&���B�3�Q��t�hYd=��������H���F�A�Xv���(
���YZ�uL�nr��
(�����Z��G�>��O�q[�j��@Hk�H��D�Of#�%��"�C��5
=P����w����ozg�j.]�B}�>'�����{!1��1�=?��_%�G�8�����K8�G�2�.�l9�"�('���M�M�
i�����1���	?l�9I�E��F;������7�=L��D�3�B6Z�{����sTP{�M/��8����CjO����5���~��w/������������%�^[
%`�*��ndQn/m�]�*������<��J���q���d�Q��zcSOx��LkLc�c}2��o���4�� o�Z��g�#�������"6������X������d:��::���/
�f�S����z(J*�\��OV�+��hIb*��a\?���D�w#A���Y�Khk�t���>:&�� �|�^�fR	�y���xl�7)j�F��5����-b���I1�6��>����k��~:c�Jn��Ch��!a�re�G�������W�iG
Q�J_�Sc��N�_

%�&1�����4*���>vw��B��VK5AkJ�2n9o=^�?�.<���b|�xeQi���i�b�#Q�������D�Q"�#I���h�c.k���o�:=	���r�s�A���������C�p����K�i6�"~:1�2H�3���r��#��W������W�^<0��kgO��t'N�2V�Q��'�����_�	��zs��;\k8�b���K0�P��0^��	_���}���s�F9�o�`2�[����b��@N���t�+2����(��;���!��)�!�� U&���6�T�w�5kUN��O��IA�0p��M���j����*�����k���QEz~3
b%Vi�=����I}�>E��p?��T��{��q�5Yk���3��l/��\�Y�-��l��b�V��O�q��C���n��@b�`�k��'���l@�>RlL��']���H~-S�j{n�kT*N��lt��S�� =��"��^'lK���RHPz1�=�Ua#�e��0����+�-o����{ak;��
�M*d&����#��)zM)ze���n���������6�{��X��+��^��z�&���	@�	��W(s�[Q���?+��-�W����W:�Dq��;�~,����l��������j�n���9�:�jz��[o�t�}�1:(����CP�R������:9��]���7`qS5�F3P���3���qr����_T����QM������!��������u5tU��fWSnX�&���F�������*��
���M������\	��y���9�xi�m�L�+?�;���y��G�n���������
���:��C������(�s�
�c�<:8�!���
�>9=������O�{J����3�����+���R_���:?sr�'��c����[���{N��GN�!aJ<���v�
P�*V�����^�w����O��D3��-����}������#��?Vj�Ifz5q�n����O��'$�{Ho���+��Io�������w��;��'~�����F�kY�w��|(S38j�K����4B�0%��
,����6�U�J�l{�D�C�����]��NNA-���c��y����GwV�^�)�?��Zr��.�6\�9�W*��S�����W����?�r� �]	 fEN2��/�
�/�B�������3��a���zM=P./"�8rU�r�Rb��_��M�~R��������:���b�C������1��nvFKN�������z�����/���N"�v��,�`���=#?���q}X��������q�sk�2���1H*#�����:�}W03��,MtE��F^��Q��=
_����eY}��Xk�|���1,��X�;���^g��������]�0��@{5w��b����� �c���UV����
���^]E4�NcV�P<���k0�T���xL~ w��'�.j �QR�^��%���q-��jQ��5�o�r~����M
5=�i��CFw���0�c��k��~�]�z���'/��?r�\
>�-]����uB<h�E��7�Os��6��8��#L?�7x���%���e�@�=EK	j�����/%_?f�}d�x� }�W���.����C�kS+C}D���dj�8�hT��0`��������R��2���
>��+s��������v��-/�
�ND��#����_
�S�!��j�O��;>Tf:�u�u]�pC{���%�k
�����2��G���]b��&�z�7J�,��
��+����+pu�A�����j�S�d�(tN�BKc�wT��U��o������Z��v�r�v������W���h7�R�������r��Z�7���������N��n7U��wrVR�����2]�����nhw*��W�~pL��K��W����g��������J�3�u���=xK������q��j�ZC���j�����S�\4����S�k=�����e������;ZQ���#�����{=	�1��G6`���|���*T�QwTov���d�x��W�t<o<���w�C��6���3�F����4��n��������F���1�v��;��Z^�ku�]
)����}�%o�K�c���
�w�I��~lITh}w���8�
�#]W!���2��B���[���g1�����p?Y��/n1��c�0d��z0C�n��tj�?������k����<RhO���dC���.A�p������!|��Ff��@&�d����j�z����]P��W�W y��F��F���VM}��;��C�FU�Q��c���V�/:�o�,���9Uu����X����FF}�P�R
ut'MLd5�h��z��)��l�]�}�A�x�[���;�f�;�T\��x��oH�V�1��v8���w�����O6�"B)��:'���.��������������7�"��#�����S\^���)�ZBL��V�����z���R���*�k��0
z�wz���s(6�1ecX@����3�����9�I�{.c�!�q���7�x������?<�0�	�!B���$6���1�7�2����x�S�6pa���	��K%A
Q<wT���v���)�1T�(�1�v�P���Z�-��� P����b�2Jf6���E��h��)�y���0���1-���J����J(�S�C�
��������|qo6�O�@ �B��*E3��
��3���3|0�(
�2���b���m����u��o��{wu}�;�Q��Xz�r��jf�����g�.nW�h�0N�:]Fe����i�`'���C8��.g��w�WoN����ywr�\��v���'����������m�z�~�K
a�%Zj�A��1.����|?���3�x�.��k����<DY�I-�<��
TSJp%B|�F��[�w�[���/G{��J\`�z��\�~���]����&�D$�'N�0�8��������o+������������mC	5�]!)V�0C��C�_�LX]D�xIoZ�V�WsjH�PhS~I
Yvh��d�%����������1�
�Gz�v��$Ubm��09Z�Gz
���������)��@ ��2�!�����i��o��64xN�9���(��mE8������99u�T�.����P�a3�3����gS�PY	��TRc���5��"=�d���K������vkH��1���+���!�� ����w4X����S��byT��4G,�(7[b��v���kk��2��GV����p
�EI�b|����\���fps{u}�]/���G���i���0�lyN�Q�6��[�����3���c�����VR���2�	0*t��#��P���wP��X�+������$���a4F����]�E�
E|���D)�nieJ_�LTh���`�%�W�:����W�"�.�e�m��A�r�@���y9K%��*���KK��O�1����S��#CmXw�N���1�U�1���1���;�H
Gu���@����
"�YlV���i#=uT��Z9yP����^��Ap�vw*X�b���h���"��1�2�s��,Y�vL�^�6\�r���"��/s,-?�X��e����:_��:_��?��w<���[��/�<��|��t��?�������eK����GR9�re���1a�t$��Df�#i�5l��d)������!0��������.'��H���m��I`Yp���,M�����V�P�/��$���U���
��cK�g�"���G�OzDh���#�O:G�7��8I�-p����	��b%�����W�wokr�y��:�+�H��SCY��W�E����E��!��f
M�/l�p�.��)�A��j�$�]E����xkz\�Ao�-/��W�Y}��;��Oj^�(�t��+�'�E�Q�%Mg���1�]c��(���#��njv����s��?D���?����S��]0ik��x���/��H��z<�-b����1�P���Ek����4/�Z�Yj�hv���=Cl;�]�7|��v�D���CU��.�L�}+����������Ss���t���#��L�EjV%�v�����qS��M5�����_�6&�YyF���&��#����Kl���rg���gN��"Mw����h���.Q�;������w���
�wyO����������I[�4|��;��6��5�����f|���<����B�,l�`�A�������4I�����4�E���e��l��n-��}��a�Z�T��SM6@\dV�N��,�@�s!�v�{�"��z���������a����L���)�H����������������"����
��N�t��w�J��:��d�d<0���~�K�����-����n~��e����\��������?�x�Y�w���3p�{���~6��;
��T)��^m����W�m�Z�O����G���qc3�1���%����
��i}����5��UN��Z;n:���������a��p:��K,�e�yA�0��
�� ���P��1ll���#}�c�`�!aA'��0�iE���N1%��}%R�2>0'�\���������S�� �v�{s?x�����i%���"#kD{��	)�?["��X�<���;�aGAqN���`L1���C����e*���>�"���SlS���b�5q��[������v���������
�9������VB+2zH���c�m,�d���'F,O�&��d��v�T��
�:fS�_��%�� |zr���Q���������-��fS/��V�{��`0-����od4�l�
'�����2�R	@+���J5j���_�~�J��S9��$������
���(I"��!H��h6�����~������{Z>���Z���5U���qX������j�w��
���IfGv�7<� r�u2��~@\���6|wKmx����������{1z�-�����-w����3��(.�;�'%?���M���v�U��T��gk�gS�n��m���Jel,y����H#�Foz8�������l�F"����5�j���XP#d��5��J��tF��$?�jCU���
����0�W��oc8��o0wC�_�����'�0�k0��`�,@g���;X��?]��40�Y������.����������+�	�=�0�Z���������1�E�S.��������p��=F\!�sx$��q��7�b����K�+��`���#.�G���![��(nT�yz:���#D�
+�q��/����p��GlI�U�M��P�!���DU��p����G7��Q�,�GNQ�vF��4pI�S��+��JtQ�X�a�T6�M+l���KpRY�G�A���*��vC�s���\>��i�.:;��;T�w,L��`�J�2']�QB��1�5������HO�$���y��]E=c��y{e,�g*t�y���u��E��Q�Ud@�"t`����f�)���e$b����
����Idn#
�'5����6}��^��%������~R5@i��;���c�UO��%����V���qk�05*�[H\��\�$U�S����87x~b]�C�u�!f����!c�M���f�U�[�CF=��?�6���	(s�l<�Q���$�Ay0��S�X�WQ�0�/&;�p#)��H}L.r��HK��+�
��UCO$�	l�H���iQ8`��`UW�*�#��[��<�f{%4�8%1�F����g�L���X��s!�G��]�a�L������&��d�� ��
�����x
m����X�Z����D���f��������h+e�����n���RWo�A�x��.]��k�����Z��V��%_�����]�=�����$E������*zN|'.�`)� ���_$L$��g`>���_����������P���f�z\b�b���;���YH����=�<w���+�[��|d��b%�4�a��Cu
"��Z�>������Fj7�����K�^`�Yo�=[�v4l5'���J6����rw*��� ��'>���&~������?9�~������� o/1'�����z=O|��>_�aD���#4��v[HS%X��s����[a�����i����7��j
��X�|V~caB3}�l�%�$�>x�!Z����>X8���m��(�%��������	b����
Nnnz��"c�_~G��:^��;�����L�4BA��g

�)��U<�
�O���:����jQ!9��1�y��f���K<X���g����W8�g5�"����U	����5�%�
E����I9FD�������&���:2pG+q��j�z�b`�p��2�(�����i�v����5)��D��f��tn[<Z'�.�e���w��
�`H�1�AO�Z����]���L~��(1YCj�+�8�K�a���w����>�?!c?���_G��U8!��QjU�7����\��-n����+An��������)e@����b���Z�f?���B�	3�V��m��~8�Nm^�;-^����E+P���q�Nz0��!j�Z����pJ�k'����+�������v��|iA�j�hK���������F������5�#	-�J�P&�u�h]��	�Y�$��1�G��D7j����$�W/��u�����1\��CM!�_��ML��BHGl��(�c��@K|g���z����y�����P�(+���������x��>��@���=�����X�����os��}���y��o�*�����_F�xC8}�`m���y�u�r��.�N�,��������N������0�������o���GZk0���6�k(��Q���|� ������zl��A�*�yQ��I}� /� ���`5Pb�1�3������F��S0�C��R�J�B-���KL���'�:
�*-�����~G(N)�+�eFH����n��hp~�X2X8�;�W�z�\^��t�SR"���U
�0���������tTU������zB��QQ��d���������� 1��f�2�|��e%�nW����J*�]{j�����6��0���W�+\KUz��G	t����3%�����|l>���B��u��8����RVN<����e�c�Pu_���~0c��)�=:�m��q������y�&��+W����i�<Yu���Mi�j�x]�����mz!I=�]��6W�n��e5�����tO�Kt�
J�I�hKo4�P��>r�0GxW��!t)k�C���8�/U����[U�n��R��s,�K��u��Ni��+ar��'z��gTK"mg�2JY����zxh�U�8��Q�N9���W�r�>�����X����^r�W	 ������z�����5-�A�'d����3	�#���a~�0l����������
2E'�X����{p�;�����������GN�+cv^��������E�m�����%9�"1�<������:,�s���t7N{l��p��</���JMW�0�W�
`b���b��������i����k�dDL�����;B�o�k5��]��B����S������(���"L���6�������+{��u�p����������W*��v������}������5�]�����p�w�!�p$$dw��&�C��[�g3�jH>R"lv�EJ������xoH�<A���f���i���-)�21�����s�c��eh���
�F���&��V��2KC�����C�r����A���E����_4�%)���%����+F�y�+"7X�F�i;�J����5���+^I�
V���9 �bD3A�6 �?Q�������}��#,����_�-?.��S��}�������%<p�;�8"7�P|�q\�0�>s�����@�Qt�l���0KnJ�b����Z�����1>������#~��
wxM�;����R�J�A�Ku���b�g���y��s������<��kAL���&\3��~AA-�h�0�)=3�A�3:+C��_R��1�C|������������$?C�_8��������J^�
�n��p�5@z����������d��'�X��M�L�m'��b�
d�����W�k�6}��u�[Z5;��rP,42�sh���cXM�e����c��j��^�o�����qN�]_j����)%��N�V�g6��~�]�2�~9M7'�|sv��9�g?�Clv�ClIQWO�Q�\���?�����N�j�V�(��-g���?r~)�qb�X���|���nmv��������J;_�_Q�' ,Bc!*�EB6�\�����4HDX�n@����nU]� ��d�8���k�u���e�����?SJ�%{(���1�����cJ�,������[pxI}��`:|W�����3������jev��ff�Y���"�m��4�;�0/��*�-���Xr
�D�=0o�@��2��^��bq���Be��@	�"�5M�9DV����+���2g ��	e�����Et�I##w�-��������\��y<�������>��
l��������� �����I�h���5�i�~�p���o�5�'d�����/C��2�,>����K��o?��t��|/����i6�����M���f$�![	�0�&�/F���g�[a%y�3�2v�?%C!����`R����q�	2z�u ��
t�
�����Xt�c��K�G'����X��S3�Yn'YM���v��!����T.a��|r?���������t���ZE�nM\�t�5��g��5��V�p~b�Y�ZTc�D�����J�����%?/��9����1I���-�4E�3J'�m��t<��i���z��6R
-���S�)��E�N"���S�B{
�SB���,��M���4�����0Ni(�27$���0�o�p��	�n����8�D��Z����1�J��:�F������|�t�.�|��Wp{
���a0����E�
#�0�O?+
������n�8N��WyF/|���������I���:7N���*�*:Z� |/�I��Gw����6.���Uz2����<�V;�,��x�Y��6Mr��lH���Z�KC ��IwX���VCpo����I�]
g��(��>K7����w=���d�#�������k�� ^���kE�}w7���F��sPa�����:B���
UQ��@eI:V^���3�N(�m�3������Y�=ik��i�SN�x rhl�<�T������E�KO%�f����R��~�&zGf�r�E��$������;C���W���lp���cp��,,|6�"E�'<0d�RyP�x��,Wo�&k��
^N�NCbKNAr�,��)ix>������������>��w��]�}#�V���z+�������' ��������-���~���V�;y��@���E����>l*/Y8>9#�8��_�t��[�����0w�������\��
�[����'�M��3��L�z>
���^��Is�w�����gg�H�Ai�K:8�^�%�@�V�V��`����:�N �������K����j;D~[����3s���B9��m�	�;�Z,�s4�^K
��y|�<+�jp������<q�dN(A������fgQ��VFV�t�>Y
�X�v�w�h����ee��?��.���l��g����x�Hl���������w	}�aH�X�<��9~\�9hS�C��P��"�~�<;Ut�Jj��F�{�����w��^��?x���w��.E|sg�sP�\��b�����qT&�g������W��=���%O�mz !�[Ke��]c�P���p��^����8zR���3W��u��
����sN���8�U��)�L�[����8�j��r�^NdFfG9&�xSJ�c������������X_l���)����
��Dd��JMf�������)���~5����1�t���Y����r4�����V�*+���'�o��e>x���c����R#Vp��]Q�X���0{�sk=��,����z�E,��09���m���@�C��Y�i*�%�2�6]� ������������r+�G
v/9��1`r|(���,qH�s�f��e�����a������j�=q������$����(e���� ]���$�]Z�qx��v����R�.E.�+.��u��n���	{�������@������s���O��7D��}�%"������������;�����~:������A�d��G/HH��F9G=y� �.�	ls�m����g5��^�Fco4�V;�F}2�Fv�;�9�v3��jd2��Zj��*?V�6,{�"'d�Y%j����d����;
	n�j�O���V�e���U���b��|)Y$����|)d��1���+H��PG-�.Qd����"�Jg�P=�kJ�#_���^�
�s�'���,*$%D�h�\�-,��%
��X�8����;���"�b���jq��$�?�	��Ob|�q� ���9���$�r�\����&��`����7:�5i��4(���4�L�_~I���f^��Z<�]�����VF[1�HG�Z�?����4nW,�f���-u^H�#X��k^��C-��dz�)�6�]��7?
!�DU���t����(��{Y��e�>X�2�����MgA=�oR��,e��	��@�^F�+�������y�����/	j0���BHco������������%��C(�J�>���"����M��aE�:-�~��fl�_p�

l������n�������3��G�R�x����]#&�-�~��\Y��Uu#1����.R�
���������z;��s@xe�L�:�?Jc7�(p�K�����1���s��Q���|���e�q��?�D�\��%�_��nsf�9�B�I�py�w�_4�=
�P�k�����t��J/�;|
��lq���)�����b@I�9'���k��)
�,�1�z�OA��uu�.� ��:�N�}�(��s	��$C���������&����$��9<�3�9����n
�]�4a��rKY��?�o���Q�]j�'9]y�_������f��E��&���&����;�x6CT������:���w����~��W��b�sD[���p@�����I���O��~���f���L���P��������Mnb�-q���x��#��(s�`
�����1�M-��Vo�F�����qc'����G��������`�S���(B6���C�Q�BX��i���N���xbi����A���I�C�0)���+�Gy9�T
;��Tt)�8����
�p�D�G�7����2��)���P4�\�.�s��
�R�����x�5:��"Y�\S��	����4�`�:->���������]����x�y�oT�uj���%����hN��K)���K����5c�c4@�&�n@�'.�S��F������^B�4]�`��((�������(������x|?��Bn��cNBLN���i������h,�9�&����[t��sDDyCxJ��A�I�!��\4|o>��f��X����"��B@rp0[�`�V�@�HR!l���X�#��T�c��= ���5Z,.��tRB*sc�[����1�WU����jz�#�B'���&l�2�H��Cc�{�%im�\9�UM*�����s��Zr*�ePH��Pf��4�f�Z���^�9��2�]�Pf�1�:5��~�m!F���(�3h�#���6Q���*�3>Y\E���4��i��[�=�<���\��c�-- L]��������L����e��'*���R�Cu1A�so�zW���'�=�x�[���}e�H��hXP_b�p���	�e.�*�@�8��XPi��KZ�V��O�Q�V3�� "�����(#.���jp���U��s|�����y���z�	��p������+�G��_���I��*��W�2>c�5��u@!��_!�K�h����3E���(��s��yj�n}�m�*\�������������*�[�L$�rV��j��O�asS�
��^�
^�'H�g'u7�r_y?����R�$I�-t�Dp��%J�820�~�R�����"41q���t�^�$��:�`9�(J�

���:��xw�;�s�4�|�Rn���������B�*��L�0�/P�B\����H�t�C ������:�<�i����1����q&��*B���j��}Y\F�!��bx�mq^�2��V]�["��O�����]!�Axn=��IS��C��R?l�Bny��7��K����3Q/����#a%h1�q����{�r��8��3qn5H�p�n-��z��JC�P	Zv>��|
��o�G��e<ls"D��.7�_|!��!?*��h����	�3D:��V.�{��?���?�G��
"�B�O�"�xZ#�D����wX�����g������������������
�V���d~�U��P�/�_��X������<O�]!�Ko~V���j�p��]J\�� �M��D��r�i�������/.��#4������TclN���n0���TVIqq���ba*���SwL
:FF�&��f\.�/��4�w�������kZ����"[IP��yEd������~��������4���A
��A�U.z7]��������lw��q�����b�*�J�2�7j:�mF<}0g�xd��F�B�B�:t�����!�4��GK*�R�GRI�=�M�lL�[ �g~o�f��$���6J����l��NX�~E��=#��Sf�����CY����{���2ux~����������B����N����+pY�b��?EF�
_��B�d�!�����5c��7����H����j���8d�:�Nq���U%|_gEP��[zW�D�A�f��i����������Fw�l���.��/[�E�-����A��l�L�9����(�K�����l��9��;�S��q���Y����,:B4�V��t�}����8�@�c.�_��Y��A*���~?���>dJ/��0��1^)�8����2�#�O� K�z�p��B�h�����j�� a?z������QV�:�8{c�y������"�����DP^�!N���k�<������{F���a���z���3���N�$!���LmW��m`�����#D,��h�_����2HEU���<��x��\��d\%��A��[�Mvh��g�av�YI�M�DZ��`�4�`NQ?l�0��4n��j`5�[K��8e)gL��ko6!�JC��4���v'�e���E��2�e�'"��+���8
��+�����������x^��!��g��
/�B1s��#|�.07�r�G��$1d���l���7M�HR�V�����J�z�($�A�&E9���tB����{*E
��6��`@�*��=���D���
������:2�e�*�/'�&�����p\�������C��j�lA�`��$��w�W��E�����u�X2�5�`h:�z��8�=S����C�"W������_��@�)l��<���	{�ak'S�����`r��r�����Z�giv4�������8�����A�a�������25[;������t��HL��y��/�I;(N�U>�����(��U�^����!�H��=l�l����:_dc���n����Jm�����l^���gL�K�oe�EI��	Gh�8�s�=�-���[�bR5�o�9��[>���{p�����
�+�������LVUo�.�@���W�'����(;���'Pp�+����Ho�!p�����V�;y���$#)2���&H�%���
V��2'&C0�D7%��!�+o1�`�Wvv���L�T��L�]�>�*��6Sl�7F���V��}-)N��!���[�2������
��D�~AJ��Rv-%I%�
�_�s>�	5�b<<�D����lys^�����}W�q@���Z�}!Z	�C!bH��i��|_'Y]@C�_��������K�R�O��,�O`�H�)�d+^<�yq�Hxq��5��Q����:�fyF7���j�fY6?3Y�5y4�?�Cpx�Co6�*�p!�W�=c_���W�D�X
��F0"������}���"�M}lf���-u���C0�Am�z�#9�3![�}YEp����kT�.-��T�;Y��gO\,R<(h������qW�B����������"S�m�
{�H���d
V�C
�3��<�"�V�,��X��S�qh�7������
�T*���%�<z�|��&[P���|��������4��q�<w>��Q�����r�L��"DW]0���~)=���"��Y�q���G�����q�d�Z�� ��M��y�� ���( d1��*�+qG�z5 ����W�T5�0��H�P]�7C�f�5������B���������B�,4�P8���+�z����a~�q_�VRy�@��4F���1>�a�
��p����E)�V�#�\
�G.�A����������I�����;�V1��,��v�������U��S3�h�Y<s:!4
���V��w����i-:k����\�����O�0>}�f��]�G�����������a�f,�a���7@K|2�P~	�D6����������L|��%z
i���=.��G�A�r�T��*�%�SpP����5���J���/].���)����|=������tVf�Z��q�H^��PuGs,%��	�������^���������SSK6�E�CX^V:�#���Bb4?�:�����v�q�pFK�����|�;��"����1z+��A:)�~m�R���k�_�v�����N����HR=7�)^���4��1\�q)_*�F(�\�R�]:4�QC9��,9N+>�.��O\��dO[���S��{	�r��V"�_D���W/)���b(>����� ��%����*�\���Q���������-����Jt�.JC=��3}1�5�9��)�.��_���4��4o�Y����_���SZz��X8W���f���N&����=}�O2*��J��(u�s�?^u/��2�#���T[P.�"*�+V�kr1nM�2����cb>�*�����L:�9A�}��m�&�Cc���p(�3:"�n^f9)���J*��N��u;��p%��K�c �"�D����).R�9��_��+^�����lK�L��c��'��FAI�ku�*�%�(�
���(A{T:�����[eTr��^�)�(�lLf3V�R^Y�*�����JcJ&���-�6��hi�D�F��a1���\������yz�/�����w�7k�����:1�6�IG�0��F��JW�q�7���8a�{�+.���h��z+����u���3t��j�9:���'�������2i;�y������A��_�:Iy`�:
��	�=�m2�kZt�����i��������1A�d:���Zw4k���sZ;$�
��1�����jG������_3�P��O�������d�Ri�C������N;{{S�H�oJ#��Q�����^�Z��9�"����u~��C��[����t5�0��>K��}a�����[z �lQE���Y?�!����]��8)�����Z����
h���i����	0���5Xm�����ju�j:'`#����'��v7�
��tA�?\1�<Fa3va��P,�h�$_��7����!l]F���qv��s7�,E0o
_S_Y��nX7q([�,��M���!��I�W;�^�	mu ���`�]� "�|q������r��Q^~5
��*z�#��������\�#]��2��)9!�~�A�8lM��a�oc�~�����UC�G�������;����~�J�����M����VT�OO����������?�wP��6��g�g�b}�5��#5l(�j�(���������CP~C�������$:��
G�m�����]_
��Y���qxt�{�g���s��{�=��jt�����.�
���4iE��e����j����*8h�)�������r}|r���V��_u�\pO�3��{wJ+[�V���|���������;x�����|(�����B��'�j;���.`A�f�dYa����T�?�=�����aV�L������RF���Z�|�GJ��<I�@�&�
�]J1���B���vx;,PmG��#R�.z��\n�h����e�}�+�����^i�#�P*gp��_�9�EH��	%5\
'��_[dV��@��p�3��)�7�?^uO�O�-���a7�yh�JF���YJ��������������A�Zk7j���K[w�y�e?�t�\�#<��	A	SH
���/�b�������3����U���	�q� ���@��7lo���u$JH�)_ZjS.��!p(�K������n_��g1���������9�B0��GG�w#!]�mw�\�i���1pYmo�V��N�>:lm����n���������V ����R��+���M4D���V^��FO��3
^����m�}���k�~�x�1*
���w�/��V�o���p�m�0��@w
w���"��2�!��F����S#Q��S�.�� :�F���x�\.�`�����1����=���X{�D����Au	n�N\Sc�[d(�$������������#����g�}L��������ga5�����	g�:N�����7�{���h���_����g��@q���k���3���������#���F@�S����]9�R��cO�Bv������^pk���(���QZE���d��u
B��=�e����u���eT/�mZ|�_W�Y�87�	l���[^Hl����%Y����_
�S�%��l�5����F��l��Ak�n������,ArM������l�����zJL0Z$\/��u_�W+@�;�����hs���_��^��g��ro*�g���t�����j��i�"��/��n��i�~Q�]�9h���S�/=���5J�J�n1�o������n��MU��{|ZV�����
]������+��W�K������{�����?.�����i.�U��V����u��k��)����Q�9rT�Vo�J�]���������J��k���Q�v�[��cw�#s����;�����2��&���(d"�}�J!�Q��t����d��CH���x(�4�����u�{sw���2�l�%1�����$����|�����b�]�a9d�]1d���|	7��q���]'�Y+eP�����7I,`9Y���Z���	����b�d���B���IY��\"�>�"� ��.��Yd�&�V#��j�0D��D�-����M�w�T��%������I*L���6�8d�����1Q8k�3G=�b/������T��'��
���}���9-
����1��A]�-H�/WSw1��B��7���f�V�{����l�X�a�n`ux
�lM�t�nW�����+ ���_��W�Wp��1��u��iT���c�o��f��A���<�L�+G��=;N�%��+�H��;���{2�r�e�����������cj=���Pb^3�^b�UF��nK�?����v����|��V�X+��emt��j��C�f]��'�M)*����j�A�'}@�����
]���T��8A�<��d>,�����c��J_�;)�N�T���v��v������*��90��������J�NFJ�v����|���qW9�Y;� �l���������j���T���fk�rd�yg�(���0�!���=.?h`t��s5����&[���Jq?;�-���l����a�#`G�vJ��]%��r��
���I��~$�3Y�n��'�0�N�
`�?y*�W,�2���"��?��<0.R��_�����=|�>����U��2Nf��N�����$��HQ���0�3��,(�(5SV��v���6B$L?Y���)�w0�@J=���>�H��/EEbyi ��uJ��
�_S�Lu`�~��]�=p��dG3�0���� 7�t���>���"&�8F��~)���"2�������B����1��8o��b���\H�/��Bc\u+�����%����)$���E@ay�\�U(��� ibb,�6h�/���D(��;6�aP&��c"��p����bs�L�=�_���;I�D����W���g���.��F_y���wfo�~	O���W�����U��lCt�<i�Q�������ff��e���kL��S"�1`�78fc���A�o����$@�������*�r@j��?���0��+M0�_TZ��c�W�+z�+����Qa
�U���C.��0�
�k��A���{�Z.�\j��,1���Z��M����W-n�Y�@��)��)cA�����ep�M�H�h f<)����1����aX{�vR�$�R�fOs����S�h�n�DI��$D����q�����d2���������C�!�pZ8"�s�#����V'Q�E�������	r����4���`���������<,����^���Y�M����x
c����Q������F�%cu�]E���k�y�:��|��F)��x���y8������W
���*��`��!��_��[
�[�y�p�]�=y&��%�3��}���a������v���F#>����W��z1�N8��X��S�PF���5tE�����(��
�2Y=�M���48�B,�����}����X]��.�����;�/��n��2�=�����$YT��G��l=G��M�QN�T�8[�4�h]�U��Z{Xk�J�$_�5�(p�jB����
N O|�;�E�x��=�?q���&ER� 4!on�����;X��� �S�C8���r>j�M`��C�����#����_a����fh6�	0��$��Q�I�
���9� �6���Nb���A����33�G��<������� �[G�f:�@������u��? ����w�� �a�
F
�I�6���f)0QT;}�������^|�6
�vx��_O��;�����NWs�u�����X9��-u����@k�:�9p)�0� �)�L�3i���El��[�����V��g�m� 2��F��=J@J��p�s+��Q��	\���,U���T�����Pw��B���Y�V)�I`0(�f
�2��]�bX
|��h^K����4�W@q������{E��XxN�N������A	r��JC�3�g�9l���-�U�x�G�����>�7��try���������u|q}|�29z�]]��<��M���l-�s��x-,�#(@�����s*��mux-!�&�@9J%5�5�M�[�0�SPA��@�������@AG��w�[I��N�����a��G!B)��K�_�D_iT�p��_���D����)KK��C��KD��2�4�Oiq�����h.M�O��|�2O@I��v_�a�6Jm��_s��4��~M ���4�K�;��ope�w-���d�Sk������eE6^�qo\#�3�����'�S�����
���09��
����|�p���<�,�0�����<��������5�
���qv�����@or��aV����
�{
���M�u!���PX���p/
��F02� �����N����P��eW���t���I�(|C�Px�h*o��Fl5��S�J����x�4���)��<N^9[z�5@>�]�	�N+��/���v��1\��D����"�Wb�������}��`�,�H������K�z�/A���:H��-%�F�3����+�����������A���QQ����X����T��f�&	%�n[V�-��s�m�y��K��8q���b<��m�U���$��������D�'kO����*�����P�?e�u��[���M4����<�!�Pw_B��������sYz��'��&N���5��d�;p����������v��:��6\:�����"#)��������e3YlJ����J} 
�{���"��%#T�\������������b�`Z�H�j���c_Yv.��:lw��O�������MG;�D��]H���0����S��lA��RB�$��`?�����S^E*fc���a?�h��8��WU
���p��oY����]������3��~��e�z�p|�?w�l�.w-ESc��G+�xT��J���X���*�K�����=�W������`�O�T;gf���Q�X
����&"��P��A��#Fi�p��'+=�H�}�������T�
��C��.�^P�w1x���������R�>�����T|��A0��4Op�Y�K�UHC�6��Q �G��A���Z�F�Zam��������So��q���M�o|\b������yN��h�U�wHw6f����d%<.w�P������8|�������3���-�f�o���pg�"�P�6��Xr�N\|�0�����.�A��AXw�!�O��lN���_@xW��C>\�/��\J
���II#�mr[EC/S]I�&�����Q7���3n� ��v����6�����]���
�SP�`�)z���Cp
��2��+��>�OS���{��:����)<,zT�1&�%!nF�0��/��&�u��{
�������<�bv;�2C(C v�����
��q�X��+��5n�d��?����j%�#E	g?�s&����������C�9���� ���d��5''�ws�A3ec�	Z�i��X���0���������_����
��M�L�~"�x���N��G�F�z����9�!�A� $��!�BE� h�:9`�� ��YXJ�^��6����v���Af}��|������38��E����yXo�E����b����G�X�����
O�TV!��o6S@/#��^�;�)�����TH�<p�K����n��������E6!Z`=%o�h�{�G��He�c�d���5:������L�A�c.f������.;��^w�1�d\�����Hc�����)$X�O�������2/\:]�9k{����St�����U?��%�~X��U�n�%�Q�!�g!�TY��_ly��,j�\��f*3����*+��v�me�@	�?�x�����u� q���%�����&=�P�^�����}z�?~{|�M?��
��N���|�Sn,�t���Pb�N�������qN�vL��CJv.:1��0�l1zP�'g\��O�ZJ��Nu��I�v�����G���j��:�~���5R]m|8j�����k����bB�C����K���'��+8��==k��*l�3��w"~����p\�F�=2��R�OY��^�{A��N(ad�%�s��d&.!�,��0��\-�!!#:$��N1O���tC��������*"����j[����t��te]��'�2vkB���[LH� S��
"h�
F����������"����
���QA�1wD��h!���@+�h���L�H(���3��!7��F�J���������=
SpO�{I9/��������D�������zg/A
��8D�L���ko� ^���b^}@�B�%�zs�t��2�����%o��Q�
��:-*H��O�n�
{��`��a!: `���H��@�T�'$��L?�����o��^lp�-��*�'���z~�]���U��������+t�Px^(l�-ZK��+d����Z�_�4��g��a��{�Rx0���_x��4r<2�V-������2N���;�)�]s��@u6���I�]
g��(�\E1X�O��D������&�h���{a�^��LE��z��K{N��1�4�����nR(v[����_{+�Y� ������H�����G�9��(��Zl�<i���:�c=<�:y2����1[�|��?��[�1J>��:0�=\"5���� ���(�K��4�/��P�b��B�;P����C�c��Y�#����Y�Xi����t:%JD���C���z��X�xq	��a%� �[�k��>�H��N�[R&o���$��eSv%�e���u��vC��"�|y���TF�ZL���7�A�.�?�{�;�u/���W�{g��R�t��
�b�������w��ZS�	��{4=��yo��`��}��q�k�Fm�����'b�����a���B�;�/�	dMc����/�n8�,�m�#[[	v5�j�X}�dn�"qdh
VQQZ�V�1V�^ct���[�(v'9�����:X���
������xZ�7�U����r��ET2_������+��Qu��
x~
�T�������)�����b�;��	��~C�h�-�K�J!��to������^v~\?��4�}��)���3X��j���(�h�*���]A;�.���?�8�6pN��������^��:�jt��������I�Y���Ywj�1���\�l����\���E���h�����u��s��,�C4���2B�����s��
������#���?�
�C�@��"E6��� �����U��������S�f����\)�W�u79<I�
����8G��S�=�����b������`Bp���p��q�%�E
I�
���z��K�X]���_`�LZ#�
q@�1Zpq�h�6V��������a�)�y�Y�P���HE��hK&�3.�U�}��=�a�6%p-���C�$�>�-����������D-�L�_k�&���Ob��qM� �N�9��$�W��0��������`�������M/I�Etz��0�~�%�pAk���S<�]���(uS���UE:f����$��q�b7��Zb`�pg��#���k^��u!��dz�I���?h���1��@�Xuv$;��IY������Y���z����DL�J4}����I%n��5�@r�����FYs�I�9mQ���������N�X=��F�M�1�����������8�Q���'7O�����m$�mz����i^��u^g1�7���oh`��!�Fvw�/�
���u�q�?�\����H&j+v�\1���,�~��\Y��U�!Q����]G�;�}.�=
������0#���@1������M`����p���L�y3�p��6_����V���?����V�]��y�k���6{��/����\��\���8-<
(c��];�BE�x��l/�;|
��h���&�����-�$n��C�M�f�YO�O�0�`��?������P����kO`�p'�>��(*D�k�L_�Bn��+��N+�i��5�Z��`��p��S����i�01Q������44�������Or�Ne�����[c��QV)�6=�~���YtJ����
��:s#��|�N1.���[7������?��Uw�$5_�$�X�p��� ��f����L��r��N�������M���%N2�W���G�9Q���2�g9���-�-�]y��[��Z��;�	���*iQ�i����`:�zI<!u���C�qjH�����f$�
�fl��:2�#��8���7�hp)5�u;#1�r��)�C���'����?�O�����4vR����@�S���"$F��o������`�\=���J��
	�p�������@�%�mU�0I�!?�-���T�AxQ��9��n=����?���@F���=�*����u�mI��om� ����-I���+�d>�c��|r~:�X�|�U{��H��x8�MN�ZN����R[p��rNb�1�u�,�o�,$i5R��%b�#q�Y��Z���7�p�=������������9-D�[�YO�� _�|l�&R�}^1��j%8ZJOG�5:5V�:N�N��fM}DWN��t"���9P�(y:OI����������;�].A���pOI�,cCL*zN]��u��5G�ngY����$�A2?�S�
I�`�^�<.�{9BK���
\v�����a��	#�o5�x���
��4���`;4C]KG���4F'_<)��R%��)=��t�B#��^�]�9����H�<T����w��%"��X`T��Y�j�3��>4aU~���<|A��B�4\���%<To�K�XN��'�=�c���9+����V"���"i�%+?EN��9D`�R�\,�r���d�A � �#
�;u�uw4�*w�5��k��&w�������F�((J������|5��e����Q�R���[.��w�5+��=v�,�%���^M2'�	��� �G�z���-�2�p�Fr��&�Ic��V�N�����gt���2������q�-n^I��r3i�S�$Y���NH-��"a�l��G<&�[w���{��������X���9�3����sy�����F�U���3���9pQq����(�J�);zw5v|���\��P�&%��v+:��fHys�_�P��a����p�jJ��)%
��H��i��t_'Y]@C�_���m\�;��Ip}�'8%,���v�rD���Va�l^h�c���;m�U�����Y�����&��Z
�
4y7�H�C�����."��c�c#IwO�e������,��G��s��K���)���Tx ���a@Rh*���
���;mM���o&��k~e��1�0!����2P-'��!��.�"#B3f?c�iW�NM��S������i��nF����t�^����:�5^<�EG�����V���8��&1������)�	�{��
CIj]�m!�s	%F�.=JP$��X�e�����3�3L?EF&�����F�3t�_J�:�`���B�\��CrGP��B�*����Q�����~�W�Z�v4�PVI��O	�a��#��L�5V�;���@q�Y�:u��;�	G���.�~���x�=w%hLZ�� �+��e=����^�3�zT���T����	����.�q
���X��uM��=�*w�p[�h�}]����o��:��c���Q~�{N�U��=K����4�e��Q���r3)QO3���0��� |Z�{u���L�(R�iwo������a|����#of�����^��V�0m3���XT���%>��T���M���4�Wtb�����C� �E���|�3��� �x@������f��p�4(�}HBC��%�E���+�����T�3g	�	&�~~<�pbUkp�!�����ZId.�%\3X����d]�
��=
=`�0����hE4h���d�[&�tB�n^�A��gnF�S������
u����;��	����h-!�X���&b0�C:)�~mp�R���k�_�v��1����NPK�b^���7��<�1N��)_��������lD@HC
E3@���g)��u��f@��K�7:����0�-�yz���*�)f6�3,���K�SY���~��dQ`�S��	v)!br.h��(u.Dj��\@��"
��������T�~l)��l��t���X(��}D���Cb<��g���s�~��gfkTi=�l~$v�HMs�����C�
>����{��^�e@G�������YQ.�"*�as��kM.���O��11�t���b6�N`�C}�J)F�����V��X
�C������rR��
	{T�R:���
������?�={o��JYDL����>��h0L���Ze����+�i�\�e�F������;Y�$L���N�sP�KP{��Q�y��E�#�"�c!��f�������%}�� ��,�-�����i�D�N��:v��}~�d����q�\=���r��R�����5�l��b�u�
��a�C����G7����9Y��j���S�z���7�GMD-
���f�i=���������`5%
m3��@��r	���Oqa�������E*9D����;!bf�a�3M��~0����������	�!wl�dU����=5d&���|���>��r1>p���]M�#�~-;�q��Q�Zk�:�Q
\ef��L44��/���u��"F����1�}�U��d�"��7�6���5���I�����k��)|M9|e���a����l5�lV�6==�XH�K�W;�^�	-6 �� ��
�V>�
�|��������3<�7N�h�%���S���t�x��x���3�k�����I�u"d�P�C�������~�J���RAd�f@���u|z��_������~��YF�7�����#�1��Na���H

���:�:v�����CP~C��S�$�K'�xd���z���w�z���v���{F<^�����`�G����a	]�o����������a�I+J�_���z_M���WU�"��y��k)g�a��''��ku���UW�
��=S��\�j������p�.��������|�����dI�������qg��9\�W����wJ�#%�
��6���z���Y���
��dR�$H����2�_|�J�r=Rj�IbZ���nX�R���'N�"����{������y����f�:m��'k�{���LN-��J���i��Qcy�!]Gh�'�O�x�v~#��5�����=1�'�|c��U�������p���&���q({QA��
����������}�������R\�[w�y�e?�t��ZG�5f���Q�
�X���Q���x���p��������
j;b�G:{)�*���EB�9�Dl ��|i�n�rqd�C1N��<��}I���@B��~����]T/$�8xqt�{7���v����<�����7na�;���0%�w�^7�~�sR�uL+�F���i{Q�����N�G�{���ly���=�`�aF������=�-��sm��o8b@�m�X�;��_g��7qwJ8��6s��s�;�����P{�K�N�I�,D%7x�o�>���J`q�t�
��1rU�@����z���xQ��$��P�(rB�T���o��55&�E���b�oL{��	��K�">�6A�}��]��5K��n)�YL�t�?L���9�N�����7�{�h?t�q�R8*]����gJlFq���k���3���������s1�$�@;�����f ��������c�m����E���^�oZ/�5�Z��m�(�"�@G��C�.�.bOlp@�d�q]�E��E��{�V��UxV4��sBtB9��s������$����Kau��DzC%R���%];�Tf;�
Z�uC�0g��� �5|v��c���#�v�)1�h��+11'�6�������m�[��k�����V���*�g���t6w����A����v+��j4jM�wN��]o�[�����V�w������F9\��-���6}�o�#����u;��*|�=>-����a��K���\u�`��t������Dj����U��������
���
�~+MO]�7���bw���rGN����z��T�Z�V�?�wm�+7M���8jFz���>��(�L�[a�NJ��2����g�����c�K�Qd"t
��>k���n���51�$�<��!��P���>}P�t�f`�h���0�9'K��RF�Q�{�~t��~)���Zf��$��@�ur&z��������U���j����f���"]�\�#��������������>�D�~r&���{�H�.�{�'� ��,&!��k�_��tuB�.�W&p���rqS�|#yR���]���2�;yC��!q yw����~����%zmtR���Ncv�)H�}�/����pSYv+�Q���t&e��q�04}Pf���8�s����JC�8z3�<�	�uG+,������:�%�c���a�@����-8(h�%�B�8@����+����Y���5�+8!�==�\7u�\M����
��TgS������~�8Ug������D��?q<�>C\t?�El�����4�#u�Z�G�^�Q��* {ur|~WZ{��Ok<�~l���Y��xP��|^L�<���~W����u��^�n��������q��/l�>kRj�b������(�I)�����a����j�R�'c�R�i}9���W�\N��>�'�	7>��j�xd2:��k0�����8R�n��[?M������������'w�^�/���\{e�y�o���VM�ge�3�g��l�Z��+S��D�W�)o�0?�/vR�4��k��#�v���nV#z�i^,�<����w�������>5�����B�i���a=0)(�W��A�u)T(�2��52��'�����i$��hJC1�4��w8�V��a]�����u�Y�4�1{j�zj�G�GB�1�}���.u�X.�#4R�R�1j/���d/�D������?ZP�/{���g��bC���*�\���c�FQH��~����}�b�#BXn�j�x����*f[s8r$l�[���Kk��\)SJA����
�G�/�&����A����0%c�t?]-P��t���b�����.��0��H�%)����)2�5�x������X��)������6�r��(����_��H����]~��i	(6,� ��S`=��
E�(���@/aw����%�&L5%��
�+G3�t|����KY��\�p���4s���ML���^s������94���@��J�6��p��mv���`�XR����Xv��^BxRs��#��T�Ib���tn"YN������4�x�t��KK�G�\o�L�k�� �^���)5C3��b��Ut�`�8@�����g �R�g �9�bhj�*b7���(�L���O(�L�������f��T�Vb7��S��v���$"G,���������5!Q ;�����!������
b"x���kf!��v�����r��d+DJ�W�pQ��<l}:���X������"S��pCq���B`)x5�o�s(��$�e���=�<6��-�A����?2`���c��h�0k������e�)D:qQ�����A�N�)���0�r��$)��;~G� �g�.p"��N-a_�����?�o);P�K\��i�����~34��pL�W�����6�9�.��_����j�+k7\��{�;>��?2n�F
��NO�$�Hw�vQ��<"K�A���Qi���2�pL�Ol>\E2������G�v���mh�O�f,1 ��S�@8[Ig�W���z@���A������$�^{����C{�"����u�d���i_��A�T!��~m7d���B��O�>A:�I����Q����C��H�#+(�\P'�7o�)�x���}e�W������44K@����^�����.����^��C��y�%��q�����f���7~�{#o�9 ��&R�/�n� Pk�!����Jzt�S�m�Nq����	u�MT�<�MH�]��>����d�{}z�G�����k����X@3����Op�hr&l)�ad���k�`8y�F"�tO-?��H8O�-�E����2P�f��d�{jA��#Ox��}���&3�)C�Gr�$cH�u��J��tJT{�y�K���*9���4
?��Zc^�[}����MN#��jb{�1up5q�I���o:w:��-��X��g��������*
c/��j{=��.H��]��	r"��K�>��x����-�Edl���8� b�*�E�>G�-�+�P���w�B���c���p�P�UB^�Fv���X���R^X�O|�Z"�. g��"���7��B-�K����T�}����+m�fZ�<���.�K("��w}DB[�{@�l �-b�M&�W��	����^���o)!J������q.w8�[�e��E�.�\����>����0�9����Op���_���?\����j4&�0@�dN�lD����IB���{�������&�?c*w��n1+���^��_��p
���|��;2:"�bo���+|��7b��G�{�wX;�V�|��S���� �c�Yt���$����h�
!���X����!h�8
 ��i�:����hn(�YCA�L���:�d����,c9�O���83�9!��W�o�����$3>����~�"�^�a)�R��%�0Q���wO������w1x�����C���������#����@����0�U��M�hy{�	����U���g�%�1r}�)���-Y���s�($�kS�����-'��I6���7U����	�B�HUA;�h����:��HW��$��rg2�g���R���t��dtwx��)z�|`�B1'�G~1����9h�;�-.H+���3�"9k����n���6Ut1t���ZB�����z0�5D�I�9e>�a��,A�V\�l�����w��?���>�����`�-�oT�[L����6����z���t�Tx���b���`r�81���!��(�������z`���sms t��Nd;�`0�0SH&D3:6n�����U(Xy���n���0oJ���U.���@�X�O���I��f�
q���3J��g����V����E��n�(Q�M�,�p/�K!�i������~"��C������uO
Af�%�m�?����D�(�����g��YBe�l��`A��]R�R%�.?�8���G�F��ba�����+%}{�9i�V���K�<���lB��ck��h�/V/��3W�����gb(X������{�_�X(�YYap�|z�/ �a��$<p���M����
��#c��	�(���N���<.fm��U��Nn9_����������|�.AI�W�]��,��6^b���q��u��fKy#��d���VAY�c�@�5$�
0��QiCg7�Uh�/Y��>E!i�-�IY ��{H�V�x,�Q�A	���m{sy5:��d�.�Fj�B�������~L�J�S��-N���\e���[��b����u1�W����F}o��y�8�I�k����=�=��!P�;�z�w�F����U�������j�C?��Az��2��?���I������}z�/�<Y�7��gF5T�
����W}n/�]$� ��n��z��O�$JW���� u�./
�+,�V��oE��Rd���i�
4N��~�W$S�E��1(tnp��@��%�w��	�_���[����������g��(f��t!I���_u��h��p&_���W����m��(�.�at����4���>L��D��k�9���#%�RD�c��^�.��8W��C�iy��P����������~I�@������_-#k��������e:�W��������w�����15m�^
~:>��{���y6�������D���7z�����]G�lQ�[��7�)�%�y��-n�^0�m7�@���qyN/,�k���dy�aw�q�VVZ"��~B������i�m����=��c�K������C-t���,W:$=	���Bo��+,�E?W�v_�T�u����D�����4v����/�2K��i�@���J'��{q�����|�@�o��19_�����+����JB	_�G����u��7r��}�}
�3�
�
dS�.��e7���5����b'F��m�r{k8\�}��Z��Gc8���$��X��[�B���hp'o��#�W�SW�qX���t
�W:��%�����
����3�
��?����F@��T�����Y1Wf��"������I9G� %��4	gkg���fQBmd��*�0m�����%�;��H��������8vg�r��F�1��z��6P�m���M��1�B
G�ps
�
t<���
�}��8k����R�����):d�E
"-)�^gfd�E��	l������5���T�d�)N���h#5j=%2(�2��F�T�������T���w���Y�t��:��>&�h�*b 
CWda�x-��w��a������{��|w���w�/<�G���+w��3Bl�p�7��O���C(b29���oD���
���I�*�b�A�O�tO
��4�Y�^��x�����q�y��R;���6��k��T�%���I<��k�H�:{,P<��l�����,G��W���u� �����r��}�h��;�9�pf��o�O��=�v�C�������z���N��
������5���`#��N���/��1����o���i��wEJ!/�\��� a"��1���C*�����������>�?1�N��<���LF�`pE�*#\���h��Ms|����7�f�h�+[�����^Fn�,�"-pD����9��Q���s�9��1�������Q���7�)GLB�����9��v8+�����&SEP�rgt���n$I-���� X����� �31���~2/�D�mW8�%B-�`����H"4���U:��{]�Txuy��5V��sp4U�d�����)��:d�}��"E�E�J@�X��TV�5�mIO�O��J��,O.�������lP1j����\K"�{���S��G�_�`�9��L�_"����2����<*r�^�v�P�y����b�^�dd�[�����u��
�F4�.'��EXI���n�O	���F�6���6]������T�,��u�W	+����Q�(rsm�W�i���I,:Q���R��� 
�x�A�A@�d�!��n����1��E�C��\����Jg��PW��Cr������l	+b�v������.��������-�o+Gx����?N������1�Y�����������9�Z��W/z���Ox�$(��a'yW8�BBD�������m�������-�wf�C1���t�k
=N~Z����;�N$. �Maf��"���q�}:+\�t���>���b�9NV���!������|Y|x��.���}������D�5(��"I�T���<���*���#.!Bn.o�u`�Eu����]b]	}S���-\��G�<E;=a�����R�;��5?g�_���1�!ZA��oQ��������7�O8�������K����)F�L���hs��	5��6��mu@h�\�g�%���F1WH�}���������0�{	�H����
��]|!.�[]�Y7��3g��k���4i��7J�vM�r4��kt�������T�����PT�K�������X������4���z�?�@Dj<����/s=��D]�\��/6��Y%f��sH[	<�����xV��)�b���S�
�("d'�9,���$��"�	���F����6���*��&�r����R<� ��u{�q(Bi���/:i���lL��t�
���Lt����xi
���`�!)�rEZ���,i,��@���#R�E0_�#�>6;z��!�G��}+�?����N��M�������V�x���	��~c�G�15����l����1�c�UP��5vm���_�_S+����^���R�0^s�3�)�d$��if��a�����uQ��s�%s1U��s���KP� Z�t3������D*���}���CQ���j��
+C����
��=�+!k,��(�lR��������8���^qw� ��+eU�����S!'����tz��],<�=>��
���Z�����b}�[LW���M�W]��x�	e@�����}���&���v����e���'�*Qa�(*zqd��N� :��f��sH.�|Vv%-XF�,����-�*�SC*�t��
��0�H�����?���� *��sH�f�x��Ks�@'�2�kF!���U����6{U���R����R����/`i=�6�����}�m������R�FE�OF�q;�6�	�n����e�U��i~�pI4=�{�Zy�*3�1`��������Y�?�D��5�
B�Yr���s�D�6�e����l���G��xH��4}���*��k���J�w�r����Q��Zu���a}��r������F�k�w�%�G]Q�z���XZ<x����� �5��68��N^������7�:|��,7����
y@d�w�'���{f�E'?2��
<��{�q��/5[Lg�/]�1��	"_��K�xU�����P=wZ$-�������Ex�"8��������b����a�!�,b����*�����Xbv��no�n�*�$��_��$/4����k�;CA�5��������z�`
���� M���B��6��������S����/|kj�7~x�����f�|^���7"��/�(��c
����a*$�ps�kM�y�8�V;�s������^R���vw�`��
���R��{9�u<\�s(�#	w"r����|@"F1������FTZ��>B(�e�5��4�*��u`��a��~�.F.�,���9>�c�T'p��v���E�?��w78�v`�WD�HNZ5�N��8%����!�jD�c
8
�
�o������Od�B�����0ii��`��H4�41lR����k���3j��������������k����07^w�wf{>���H�����J���)H��>�,qe7Y�l7�:M�=!b4��	�fy�}*leAQkF�OUo�F�4�yCY��z�P�5���&�����
��<��+��D����e����g����[,!��=�,'��P�"bZ��V����K��� :���eh���D����0�_4�@C
)v�!�
n�������J>]�&�$��,]�8��c����cQ�?�u����r��Yda���xZ�n����iP�y�[��cAan�0H�'�g��^�:��������bi�c��%��m��@Y��d��Qwa�L���!2:uS\9f�2<��Y.�
��~��e���.�;]`���ro( B����~J�d��<�Dt��3]�1�IV���������x:���LR�S9����P���B�-o|0�V[��;q�\Y�]������A��R%��P���'�|�F��8�T�uRx"���0�����Ii�S+B7CR���I1k�_��������s�y��.��^P��]��O���-$<�J�;���;5�)�}-]�����a9
��n��YB<�����������zH��Zl���lk7��l�i�}n��F+v�,,��R�f��Rk-K��������>:����N�n���FN 9p*���0Qc��Q���2�3S\o���=��}`����<��Wu;�|D��h*�V��9It��cam�+� ���a���.;O��4��������(~�%��tq��>uWR�h���D��K0��l��rM��F��W�����kV����a��J��Ku�igQ��S3OfN��������M������W`������5������\$�p�$�b��{��o�X�z0�]R����h$jKd\��{�$���C�I���:�����!G�0m��Z�l6���,yL'J��$�|��W<��E�o�0�@�-7]pw3�V+zAXC�b��L�9�I;�����a��tp�]��
������=��x��8q����,��5����	p8�A[��l����G�"������1�D��NK
���n�)�G��+B-7�C��@������3g8pQGZ��Q�4�i��c"��f�~������n�P4�4@D�
�����+r����*�����$�B�qC�����#}���!�,������;�r���ZG�M3�����[�9�fKn�P�c�#��pT�1�R2f�1\�3'��X������(k8�,|
;�����Y�J����YE�%�[��oi�H�:���m��m��m��Y�K��6kqu�l
n�����/���1�/��Q�s0Go�v���aR��\�O��w��z	���3:-�vwI��1WJ����L1;r�������0�Q�m�������h2$��zk�;I1<��P�h@�D�F��J��kgaQE�b����H��n
\��P��^�4��Q�Ks7� >#��I�!D���W�)�$��]3^�2�^���^�6��������E�^"v�%�&;e��R���}���%���u/��������������U������$5�����l<I����dZp|	���iXg�`��u����9�f �,h/	�����*�R��0T%��c�0������=�n�G��Q[6�~�* ��+��^
�d�h6a�S�=�'�%������o�N3N+{J�����S�p-�R�*���+V$s����G>��L��7�Z���q�����Gt�G<L�lR���������9�3��=��K����{uya*L+�^P�o4~}�R���;���L$S� o������n��^����Z��V�����fi��z�����]��Y~H�F�,��j����{�U��!O�K@�qz����#�#b�H�o��8��e/"���f�@{[lH���A�q8�8U�d�[T����$�~��X�IG��5���r��-���U���>��J�AWFq��;���
5�2��I��qG�:����5�n��F��a�I������D������x6?,���&�v(������G����~�J���������S���w~|�g*eW����ROd�p��O�b�����{���9K2��H���#`;��}F�1B����V�+�6��!�|�:�'�;��lHXn��}.SF�x���������������O�Q]:����nR��ko�&��)�"�U���:u�
����\`Lf2f�D���*��A�H�DV�!+�z��g�����?v��sV����	�`;�'Q�cH]������bCE����a��''��k)�y~y����rTT������Mw�����W��������5�V;c�Y�6K��;��>7?���YG��i$Y��0���+�|/(��3 7Y���H���8W9E�:v��u���+��:�6���qc���r[k}����NF1o_��P%+N
bdG�����(^�� �z0��I�u<���b��	��j������pu��p�+�����E��$�G�������2p0��z��L�p��[�������L5��
�K�g����]������&'�=>(�Q��F��D�.+�
��-�y�~&O��{��{{<����?��b5EtD�W���2��%�j��~�����*��~+��xt�J~�z�����r�c�e���t�duAO�����������z���4[������;<�8��u;��=L��L��m�spg��X1���&u|~2X/(*��v��G0`_�[���Y!��R�H`���_����?��/����Rr �J�g�d���66�-�_�����:S/�7�wC�?b�B�Wj������/��$�%w�rG�)��v������*�����P|���#	",3����}����g����M��k�khJ�G,"/*��b�yC���g�/eyA`+i����
t������|�p��<0D\�[#��:{?�p���{������������^pDP�s��Go}����H��D���� 4����]�xn�U��b2��������j�� \�m�L��l��/g�WV�(V���f����+��?�,@��=��?��;�.f��K��x�Cw��^�c|��r������"^e_�miVE�������7�~O�M_��J���~e);hN1��U�B���4��*��q����q��W���D���t�2�;����R{���vxL��^��n;�j������)?w�x[_e����P�����p�3�)��{�����e@e�jl'���4�$T�����=U�����A����E���n����0�~F�5:�����J�\�|�=?�#P'��'�LU�;�T��e���{�f����b�\a�I�o0����,v����1�8�C�zO�����kZ}�Zu/N��?���K�������k{�o�k���p�o5$$<�A�H�0�^� �~
poM|�kw����� {��Y �c�vpY}o���^{D$��	�Dv��L�_�8���#�>���|����h����2�!-hO�!�\�NM����A�����y�U�o@��0e��)�k�q����K�V�����-����T�������Y�B8tAv�R� x`�a��@y��
CBw�zAH�}"3PH#�lxsW��JH��2�����j�rJ�8�Z�T��_�cew>U&4.*Z�pOdAkU�~� �����XKctKQ@�\_�����v����TW�~<b����O]mhMT�yq�������"��4��������l�N��u"R��Br�R�$���Ku�=�]!o��U����_�5��d:C8I,P���D]���6�c�DW��i�^�����z~��YO�g#Jz�����R����W�&7�������"oX���s�}
��7�=``o�8V��~���1�r�(�!�j��1�a��(����)���G��p	�����@t ��0����0]�2�Gh��i�_I�������mf�������
x������y��&�>��,X��O��	�`���5C=R�l�������j�
�P;�����xzx*1�^��n���������?_��|�����������?_���_�����<V�

#70

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#69)

Re: race condition in pg_class

On Thu, Sep 19, 2024 at 02:33:46PM -0700, Noah Misch wrote:

On Mon, Sep 09, 2024 at 10:55:32AM +0530, Nitin Motiani wrote:

On Sat, Sep 7, 2024 at 12:25 AM Noah Misch <noah@leadboat.com> wrote:

https://commitfest.postgresql.org/49/5090/ remains in status="Needs review".
When someone moves it to status="Ready for Committer", I will commit
inplace090, inplace110, and inplace120 patches. If one of you is comfortable
with that, please modify the status.

Done.

FYI, here are the branch-specific patches. I plan to push these after the v17
release freeze lifts next week.

Pushed, but the pushes contained at least one defect:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=akepa&dt=2024-09-24%2022%3A29%3A02

I will act on that and other buildfarm failures that show up.

#71

exclusion@gmail.com

about 1 year ago

In reply to: Noah Misch (#70)

1 attachment(s)

Re: race condition in pg_class

Hello Noah,

25.09.2024 01:43, Noah Misch wrote:

Pushed, but the pushes contained at least one defect:

Please look at an anomaly introduced with a07e03fd8.
With the attached modification for intra-grant-inplace.spec, running this
test triggers a Valgrind-detected error for me:
==00:00:00:09.624 319033== Conditional jump or move depends on uninitialised value(s)
==00:00:00:09.624 319033==    at 0x25D120: DoesMultiXactIdConflict (heapam.c:7373)
==00:00:00:09.624 319033==    by 0x25B45A: heap_inplace_lock (heapam.c:6265)
==00:00:00:09.624 319033==    by 0x27D8CB: systable_inplace_update_begin (genam.c:867)
==00:00:00:09.624 319033==    by 0x33F717: index_update_stats (index.c:2856)
==00:00:00:09.624 319033==    by 0x33FEE2: index_build (index.c:3106)
==00:00:00:09.625 319033==    by 0x33C7D3: index_create (index.c:1276)
==00:00:00:09.625 319033==    by 0x451000: DefineIndex (indexcmds.c:1216)
==00:00:00:09.625 319033==    by 0x48D091: ATExecAddIndex (tablecmds.c:9156)
==00:00:00:09.625 319033==    by 0x483F8E: ATExecCmd (tablecmds.c:5302)
==00:00:00:09.625 319033==    by 0x483877: ATRewriteCatalogs (tablecmds.c:5186)
==00:00:00:09.625 319033==    by 0x482B9A: ATController (tablecmds.c:4741)
==00:00:00:09.625 319033==    by 0x4827A1: AlterTable (tablecmds.c:4387)
==00:00:00:09.625 319033==

Perhaps current_is_member in heap_inplace_lock() should be initialized
before the DoesMultiXactIdConflict() call as in other places...

Best regards,
Alexander

Attachments:

intra-grant-inplace-mod.patchtext/x-patch; charset=UTF-8; name=intra-grant-inplace-mod.patchDownload

diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 2992c85b44..3339c9f400 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -90,9 +90,9 @@ permutation
 
 # inplace wait NO KEY UPDATE w/ KEY SHARE
 permutation
-	keyshr5
 	b3
 	sfnku3
+	keyshr5
 	addk2(r3)
 	r3

#72

noah@leadboat.com

about 1 year ago

In reply to: Alexander Lakhin (#71)

1 attachment(s)

Re: race condition in pg_class

On Mon, Oct 21, 2024 at 10:00:00PM +0300, Alexander Lakhin wrote:

Please look at an anomaly introduced with a07e03fd8.
With the attached modification for intra-grant-inplace.spec, running this
test triggers a Valgrind-detected error for me:
==00:00:00:09.624 319033== Conditional jump or move depends on uninitialised value(s)
==00:00:00:09.624 319033==ï¿½ï¿½ï¿½ at 0x25D120: DoesMultiXactIdConflict (heapam.c:7373)
==00:00:00:09.624 319033==ï¿½ï¿½ï¿½ by 0x25B45A: heap_inplace_lock (heapam.c:6265)
==00:00:00:09.624 319033==ï¿½ï¿½ï¿½ by 0x27D8CB: systable_inplace_update_begin (genam.c:867)
==00:00:00:09.624 319033==ï¿½ï¿½ï¿½ by 0x33F717: index_update_stats (index.c:2856)
==00:00:00:09.624 319033==ï¿½ï¿½ï¿½ by 0x33FEE2: index_build (index.c:3106)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x33C7D3: index_create (index.c:1276)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x451000: DefineIndex (indexcmds.c:1216)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x48D091: ATExecAddIndex (tablecmds.c:9156)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x483F8E: ATExecCmd (tablecmds.c:5302)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x483877: ATRewriteCatalogs (tablecmds.c:5186)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x482B9A: ATController (tablecmds.c:4741)
==00:00:00:09.625 319033==ï¿½ï¿½ï¿½ by 0x4827A1: AlterTable (tablecmds.c:4387)
==00:00:00:09.625 319033==

Thanks.

Perhaps current_is_member in heap_inplace_lock() should be initialized
before the DoesMultiXactIdConflict() call as in other places...

heap_inplace_lock() ignores current_is_member after computing it, so let's
just pass NULL, as attached.

Attachments:

inplace129-DoesMultiXactIdConflict-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Stop reading uninitialized memory in heap_inplace_lock().
    
    Stop computing a never-used value.  This removes the read; the read had
    no functional implications.  Back-patch to v12, like commit
    a07e03fd8fa7daf4d1356f7cb501ffe784ea6257.
    
    Reported by Alexander Lakhin.  Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/6c92f59b-f5bc-e58c-9bdd-d1f21c17c786@gmail.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index da5e656..82a0492 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6260,10 +6260,9 @@ heap_inplace_lock(Relation relation,
 			LockTupleMode lockmode = LockTupleNoKeyExclusive;
 			MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate;
 			int			remain;
-			bool		current_is_member;
 
 			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
-										lockmode, &current_is_member))
+										lockmode, NULL))
 			{
 				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 				ret = false;
diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out
index b5fe8b0..4e9695a 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -63,6 +63,30 @@ step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
 step r3: ROLLBACK;
 step addk2: <... completed>
 
+starting permutation: b3 sfnku3 keyshr5 addk2 r3
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step keyshr5: 
+	SELECT relhasindex FROM pg_class
+	WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+-----------
+f          
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step r3: ROLLBACK;
+step addk2: <... completed>
+
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
 step sfnku2: 
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 2992c85..9936d38 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -96,6 +96,14 @@ permutation
 	addk2(r3)
 	r3
 
+# reproduce bug in DoesMultiXactIdConflict() call
+permutation
+	b3
+	sfnku3
+	keyshr5
+	addk2(r3)
+	r3
+
 # same-xact rowmark
 permutation
 	b2

#73

noah@leadboat.com

about 1 year ago

In reply to: Noah Misch (#57)

Re: race condition in pg_class

On Thu, Aug 22, 2024 at 12:32:00AM -0700, Noah Misch wrote:

This move also loses the optimization of unpinning before XactLockTableWait().
heap_update() doesn't optimize that way, so that's fine.

In this other thread, I'm proposing to go back to unpinning:
/messages/by-id/20241027214035.8a.nmisch@google.com

#74

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Noah Misch (#13)

1 attachment(s)

Re: race condition in pg_class

Hi,

On 2024-05-12 16:29:23 -0700, Noah Misch wrote:

Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>

Make TAP todo_start effects the same under Meson and prove_check.

This could have caused spurious failures only on SPARC Linux, because
today's only todo_start tests for that platform. Back-patch to v16,
where Meson support first appeared.

Reviewed by FIXME.

Discussion: /messages/by-id/FIXME

diff --git a/src/tools/testwrap b/src/tools/testwrap
index d01e610..9a270be 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,12 +41,22 @@ env_dict = {**os.environ,
'TESTDATADIR': os.path.join(testdir, 'data'),
'TESTLOGDIR': os.path.join(testdir, 'log')}

-sp = subprocess.run(args.test_command, env=env_dict)
+sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
+# Meson categorizes a passing TODO test point as bad
+# (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
+# directive, so Meson computes the file result like Perl does.  This could
+# have the side effect of delaying stdout lines relative to stderr.  That
+# doesn't affect the log file, and the TAP protocol uses stdout only.
+for line in sp.stdout:
+    if line.startswith(b'ok '):
+        line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
+    sys.stdout.buffer.write(line)
+returncode = sp.wait()

This has the issue that it causes the testwrap output to be buffered, which
makes running tests with ``meson test -v <testname>` update the output less
promptly, only updating whenever the output buffer is flushed.

That's not the end of the world, but it'd be nice to get the output more
promptly again. It doesn't matter that much when running the tests normally,
but if you run them with valgrind or such and you just want to see the first
failure, because it's going to take an hour to finish all tests...

The easiest fix is to just explicitly flush after each line, as in the
attached.

Greetings,

Andres Freund

Attachments:

v1-0001-meson-Flush-stdout-in-testwrap.patchtext/x-diff; charset=us-asciiDownload

From 952068e2e6912698b291ee02d3682349d291a42d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 23 Sep 2024 11:36:37 -0400
Subject: [PATCH v1] meson: Flush stdout in testwrap

Otherwise the progress won't reliably be displayed during a test.
---
 src/tools/testwrap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/tools/testwrap b/src/tools/testwrap
index 8ae8fb79ba7..21105146c9d 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -61,6 +61,7 @@ for line in sp.stdout:
     if line.startswith(b'ok '):
         line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
     sys.stdout.buffer.write(line)
+    sys.stdout.flush()
 returncode = sp.wait()
 
 if returncode == 0:
-- 
2.48.1.76.g4e746b1a31.dirty

#75

noah@leadboat.com

10 months ago

In reply to: Andres Freund (#74)

Re: race condition in pg_class

On Tue, Mar 18, 2025 at 03:03:52PM -0400, Andres Freund wrote:

Subject: [PATCH v1] meson: Flush stdout in testwrap

Otherwise the progress won't reliably be displayed during a test.
---
src/tools/testwrap | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/tools/testwrap b/src/tools/testwrap
index 8ae8fb79ba7..21105146c9d 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -61,6 +61,7 @@ for line in sp.stdout:
if line.startswith(b'ok '):
line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
sys.stdout.buffer.write(line)
+    sys.stdout.flush()
returncode = sp.wait()

Fine with me.

#76

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Noah Misch (#75)

Re: race condition in pg_class

Hi,

On 2025-03-18 12:17:41 -0700, Noah Misch wrote:

On Tue, Mar 18, 2025 at 03:03:52PM -0400, Andres Freund wrote:
Subject: [PATCH v1] meson: Flush stdout in testwrap

Otherwise the progress won't reliably be displayed during a test.
---
src/tools/testwrap | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/tools/testwrap b/src/tools/testwrap
index 8ae8fb79ba7..21105146c9d 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -61,6 +61,7 @@ for line in sp.stdout:
if line.startswith(b'ok '):
line = line.replace(b' # TODO ', b' # testwrap-overridden-TODO ', 1)
sys.stdout.buffer.write(line)
+    sys.stdout.flush()
returncode = sp.wait()
Fine with me.

Cool. Pushed.

Thanks,

Andres

#77

[1]: /messages/by-id/30783e-68c28a00-9-41004480@130449754

tgl@sss.pgh.pa.us

4 months ago

In reply to: Noah Misch (#13)

Re: race condition in pg_class

[ blast-from-the-past department ]

Noah Misch <noah@leadboat.com> writes:

I'm attaching patches implementing the LockTuple() design.

inplace010-tests-v1.patch from this message, committed as
0844b3968, contains this bit:

new file mode 100644
index 00000000000..0367c0e37ab
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,17 @@
+CREATE DATABASE regression_tbd
+   ENCODING utf8 LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+

It emerges that the "ALTER DATABASE regression_utf8 RESET TABLESPACE"
command is a complete no-op [1]/messages/by-id/30783e-68c28a00-9-41004480@130449754. I am guessing that that was not the
behavior you had in mind, and am wondering if we are losing any test
coverage thereby. Did you have a specific reason for manipulating the
TABLESPACE property and not some random GUC setting?

regards, tom lane

#78

noah@leadboat.com

4 months ago

In reply to: Tom Lane (#77)

1 attachment(s)

Re: race condition in pg_class

On Thu, Sep 11, 2025 at 12:36:04PM -0400, Tom Lane wrote:

inplace010-tests-v1.patch from this message, committed as
0844b3968, contains this bit:
new file mode 100644
index 00000000000..0367c0e37ab
--- /dev/null
+++ b/src/test/regress/sql/database.sql
@@ -0,0 +1,17 @@
+CREATE DATABASE regression_tbd
+   ENCODING utf8 LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
+ALTER DATABASE regression_tbd RENAME TO regression_utf8;
+ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
+ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
+
It emerges that the "ALTER DATABASE regression_utf8 RESET TABLESPACE"
command is a complete no-op [1]. I am guessing that that was not the
behavior you had in mind, and am wondering if we are losing any test
coverage thereby. Did you have a specific reason for manipulating the
TABLESPACE property and not some random GUC setting?

I have no specific notes or memories about that RESET TABLESPACE, but I likely
wanted "SET TABLESPACE pg_default". RESET TABLESPACE may still have the
effect of loading a catcache entry, as a later comment says:

-- Test PgDatabaseToastTable. Doing this with GRANT would be slow.
BEGIN;
UPDATE pg_database
SET datacl = array_fill(makeaclitem(10, 10, 'USAGE', false), ARRAY[5e5::int])
WHERE datname = 'regression_utf8';
-- load catcache entry, if nothing else does
ALTER DATABASE regression_utf8 RESET TABLESPACE;
ROLLBACK;

That originated to exercise catcache.c code specific to toast_flatten_tuple()
for an inplace-updated catalog. (Commit af8cd16 later removed the
inplace-specific code.) SET TABLESPACE rejects transaction blocks, so that
would need another way to load a catcache entry. I'm inclined to change it as
attached. This doesn't reduce database.sql test coverage of dbcommands.c or
catcache.c, so I'll bet it doesn't lose anything vs. the original database.sql
commit. Some check-tests runs showed it adding database.sql coverage of
catcache.c:1908-1911 and catcache.c:1983-1984, so that's a bonus.

[1] /messages/by-id/30783e-68c28a00-9-41004480@130449754

Do you plan to back-patch that? That should dictate whether I back-patch the
test change.

Attachments:

no-reset-tablespace-v1.patchtext/plain; charset=us-asciiDownload

diff --git a/src/test/regress/expected/database.out b/src/test/regress/expected/database.out
index 4cbdbdf..6b879b0 100644
--- a/src/test/regress/expected/database.out
+++ b/src/test/regress/expected/database.out
@@ -2,7 +2,7 @@ CREATE DATABASE regression_tbd
 	ENCODING utf8 LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
 ALTER DATABASE regression_tbd RENAME TO regression_utf8;
 ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
-ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 SET TABLESPACE pg_default;
 ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
 -- Test PgDatabaseToastTable.  Doing this with GRANT would be slow.
 BEGIN;
@@ -10,7 +10,7 @@ UPDATE pg_database
 SET datacl = array_fill(makeaclitem(10, 10, 'USAGE', false), ARRAY[5e5::int])
 WHERE datname = 'regression_utf8';
 -- load catcache entry, if nothing else does
-ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 RENAME TO regression_rename_rolled_back;
 ROLLBACK;
 CREATE ROLE regress_datdba_before;
 CREATE ROLE regress_datdba_after;
diff --git a/src/test/regress/sql/database.sql b/src/test/regress/sql/database.sql
index 46ad263..4ef3612 100644
--- a/src/test/regress/sql/database.sql
+++ b/src/test/regress/sql/database.sql
@@ -2,7 +2,7 @@ CREATE DATABASE regression_tbd
 	ENCODING utf8 LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
 ALTER DATABASE regression_tbd RENAME TO regression_utf8;
 ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace;
-ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 SET TABLESPACE pg_default;
 ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123;
 
 -- Test PgDatabaseToastTable.  Doing this with GRANT would be slow.
@@ -11,7 +11,7 @@ UPDATE pg_database
 SET datacl = array_fill(makeaclitem(10, 10, 'USAGE', false), ARRAY[5e5::int])
 WHERE datname = 'regression_utf8';
 -- load catcache entry, if nothing else does
-ALTER DATABASE regression_utf8 RESET TABLESPACE;
+ALTER DATABASE regression_utf8 RENAME TO regression_rename_rolled_back;
 ROLLBACK;
 
 CREATE ROLE regress_datdba_before;

#79

tgl@sss.pgh.pa.us

4 months ago

In reply to: Noah Misch (#78)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Thu, Sep 11, 2025 at 12:36:04PM -0400, Tom Lane wrote:

It emerges that the "ALTER DATABASE regression_utf8 RESET TABLESPACE"
command is a complete no-op [1]. I am guessing that that was not the
behavior you had in mind, and am wondering if we are losing any test
coverage thereby. Did you have a specific reason for manipulating the
TABLESPACE property and not some random GUC setting?

I have no specific notes or memories about that RESET TABLESPACE, but I likely
wanted "SET TABLESPACE pg_default". RESET TABLESPACE may still have the
effect of loading a catcache entry, as a later comment says:
...
I'm inclined to change it as
attached. This doesn't reduce database.sql test coverage of dbcommands.c or
catcache.c, so I'll bet it doesn't lose anything vs. the original database.sql
commit. Some check-tests runs showed it adding database.sql coverage of
catcache.c:1908-1911 and catcache.c:1983-1984, so that's a bonus.

Thanks, that looks sane.

Do you plan to back-patch that? That should dictate whether I back-patch the
test change.

That change will convert some commands that are currently no-ops into
errors, which doesn't seem like a great thing to do in minor releases.
However ... it might still be okay to cram it into v18. Do you have
an opinion about that?

regards, tom lane

#80

noah@leadboat.com

4 months ago

In reply to: Tom Lane (#79)

Re: race condition in pg_class

On Thu, Sep 11, 2025 at 03:34:47PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Thu, Sep 11, 2025 at 12:36:04PM -0400, Tom Lane wrote:

It emerges that the "ALTER DATABASE regression_utf8 RESET TABLESPACE"
command is a complete no-op [1]. I am guessing that that was not the
behavior you had in mind, and am wondering if we are losing any test
coverage thereby. Did you have a specific reason for manipulating the
TABLESPACE property and not some random GUC setting?

I have no specific notes or memories about that RESET TABLESPACE, but I likely
wanted "SET TABLESPACE pg_default". RESET TABLESPACE may still have the
effect of loading a catcache entry, as a later comment says:
...
I'm inclined to change it as
attached. This doesn't reduce database.sql test coverage of dbcommands.c or
catcache.c, so I'll bet it doesn't lose anything vs. the original database.sql
commit. Some check-tests runs showed it adding database.sql coverage of
catcache.c:1908-1911 and catcache.c:1983-1984, so that's a bonus.

Thanks, that looks sane.

Do you plan to back-patch that? That should dictate whether I back-patch the
test change.

That change will convert some commands that are currently no-ops into
errors, which doesn't seem like a great thing to do in minor releases.

My default would be no back-patches of the above, but one could defend either
choice. The "error if some pg_db_role_setting entry exists, else no error"
behavior (postgr.es/m/1791672.1757607520@sss.pgh.pa.us) means the no-op
outcome already isn't 100%. That reduces the risk that someone is relying on
the no-op, but some risk remains.

However ... it might still be okay to cram it into v18. Do you have
an opinion about that?

Feels like something I wouldn't do, but I don't have a concrete reason.

#81

tgl@sss.pgh.pa.us

4 months ago

In reply to: Noah Misch (#80)

Re: race condition in pg_class

Noah Misch <noah@leadboat.com> writes:

On Thu, Sep 11, 2025 at 03:34:47PM -0400, Tom Lane wrote:

However ... it might still be okay to cram it into v18. Do you have
an opinion about that?

Feels like something I wouldn't do, but I don't have a concrete reason.

I'm content with making it master-only.

regards, tom lane

#82

noah@leadboat.com

4 months ago

In reply to: Tom Lane (#81)

Re: race condition in pg_class

On Thu, Sep 11, 2025 at 10:12:59PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Thu, Sep 11, 2025 at 03:34:47PM -0400, Tom Lane wrote:

However ... it might still be okay to cram it into v18. Do you have
an opinion about that?

Feels like something I wouldn't do, but I don't have a concrete reason.

I'm content with making it master-only.

I've pushed the test update to master.

#83