Proposal for discussions: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Started by Shayon Mukherjee6 months ago4 messageshackers

shayonj@gmail.com

6 months ago

Hello hackers,

I stumbled upon a case and wanted to raise a quick thread to get some
feedback on this.

Basically, dropping a foreign key constraint or a table that owns an FK
currently blocks reads on the other table due to AccessExclusive locks
taken while removing the FK’s internal triggers and constraint metadata. In
busy systems, this short full-read outage can cause user-visible timeouts
for otherwise read-only traffic. Similar topic discussed here as well [1]/messages/by-id/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel@cybertec.at

Example setup
- pktable(id primary key)
- fktable(id primary key, fk references pktable(id))

Where reads get blocked today
- ALTER TABLE fktable DROP CONSTRAINT fkname:
- The FK lives on fktable, and its RI action triggers live on pktable.
- Both tables see AccessExclusive-level effects during removal, so
SELECTs on either can be blocked while the FK and triggers are dropped.

- DROP TABLE fktable:
- fktable is dropped with AccessExclusive (expected).
- While removing RI action triggers on pktable, pktable also sees an
AccessExclusive lock, so SELECTs on pktable can be blocked even though
pktable is not being dropped.

- DROP TABLE pktable CASCADE:
- pktable is dropped with AccessExclusive (expected).
- FK removal on fktable (check triggers) also introduces an
AccessExclusive effect on fktable, so SELECTs on fktable can be blocked.

Proposal
I wanted to see if we could reduce the relation-level lock used
specifically for FK/trigger removal from AccessExclusive to
ShareRowExclusive. This keeps readers moving while still blocking writers
during the small window where RI triggers/constraint rows are removed and
relcache is invalidated. Ideally, with a change like this `ALTER TABLE
fktable DROP CONSTRAINT fkname` should take ShareRowExclusive for fktable
and pktable, and `DROP TABLE fktable` should take ShareRowExclusive for
fktable and AccessExclusive for pktable.

Next, dropping fktable would take ShareRowExclusive for fktable and
AccessExclusive for pktable. And DROP TABLE pktable CASCADE would take
AccessExclusive for pktable and ShareRowExclusive for fktable.

My understanding is that the table being dropped still uses
AccessExclusive; reads/writes on that table remain blocked as today while
it updates relcache, updating metadata/catalog entries and other cleanup
tasks.

Also, I believe ShareRowExclusive would still serialize writers, so there
should be no window where DML can bypass enforcement during removal. So
this change should not affect correctness?

I just attached a small patch to get the idea across. If folks think this
direction makes sense, I am happy to work on a more complete patch as well.
If I’ve missed any critical cases that truly need AccessExclusive at the
relation level during FK/trigger removal, especially around partitions,
pending trigger events, hot standby, logical decoding, event triggers or
something else perhaps please let me know.

[1]: /messages/by-id/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel@cybertec.at
/messages/by-id/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel@cybertec.at

Thanks for your time and feedback.
Shayon

Shayon Mukherjee

shayonj@gmail.com

6 months ago

In reply to: Shayon Mukherjee (#1)

[PATCH] Proposal: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Hello,

Following up on the previous thread - I took a stab at trying to see what a
full patch for the proposal to reduce lock levels during FK/trigger drops
would look like, and this is what I ended up with.

Recap on the problem space:

When dropping a foreign key constraint (or a table that owns one),
PostgreSQL currently takes AccessExclusiveLock on both the table being
altered and the referenced table for FKs. This blocks all access, including
plain SELECTs, on the other table during FK/trigger removal.

Concrete example:
- Setup: pktable(id PK), fktable(id PK, fk → pktable)
- Action: BEGIN; DROP TABLE fktable
- From another shell: SELECTs on pktable are blocked while fktable's RI
action triggers are removed from pktable, even though pktable itself is not
being dropped

Why it matters:
In busy production systems, this creates a brief but total read outage on
the referenced table. We've seen cases where webhook handlers doing simple
SELECTs hit lock_timeout and failed during a routine table drop, because
the referenced table (tenants) was locked exclusively to remove FK triggers.

My hope with this patch is that it reduces the lock on the referenced table
from AccessExclusive to ShareRowExclusive, allowing SELECTs to proceed
while still blocking writers during the brief FK/trigger removal window.

What this patch does:
- Reduces the relation-level lock from AccessExclusive to ShareRowExclusive
in RemoveConstraintById(), RemoveTriggerById(), and
dropconstraint_internal()
- This allows SELECTs on the other table (e.g., the referenced table in an
FK relationship) to proceed during FK/trigger removal
- The table being directly altered/dropped still gets AccessExclusive as
before
- Writers remain blocked (ShareRowExclusive conflicts with
RowExclusiveLock), so there's no window where DML can bypass enforcement

What it also includes:
- Lock changes in 3 backend files (pg_constraint.c, trigger.c, tablecmds.c)
- Updated isolation tests (detach-partition-concurrently-4 + new
fk-drop-constraint-concurrency)
- Documentation updates (alter_table.sgml)
- 8 test permutations covering: regular FKs, DROP TABLE, self-referential
FKs, ALTER COLUMN TYPE, prepared plans

All FK-related isolation tests pass. The changes are minimal and
surgical—just lock level adjustments with updated comments.

I'd be curious to hear any thoughts/feedback on this, especially:
- Whether the ShareRowExclusive approach makes sense or if there are edge
cases I'm missing
- Hot Standby implications (this only affects primary; standbys still use
AccessExclusive during WAL replay)
- Any concerns around event triggers, extensions, or operational tooling
that might have relied on the stronger lock behavior

Patch attached.

Thanks,
Shayon

On Tue, Oct 7, 2025 at 12:54 PM Shayon Mukherjee <shayonj@gmail.com> wrote:

Show quoted text

Hello hackers,

I stumbled upon a case and wanted to raise a quick thread to get some
feedback on this.

Basically, dropping a foreign key constraint or a table that owns an FK
currently blocks reads on the other table due to AccessExclusive locks
taken while removing the FK’s internal triggers and constraint metadata. In
busy systems, this short full-read outage can cause user-visible timeouts
for otherwise read-only traffic. Similar topic discussed here as well [1]

Example setup
- pktable(id primary key)
- fktable(id primary key, fk references pktable(id))

Where reads get blocked today
- ALTER TABLE fktable DROP CONSTRAINT fkname:
- The FK lives on fktable, and its RI action triggers live on pktable.
- Both tables see AccessExclusive-level effects during removal, so
SELECTs on either can be blocked while the FK and triggers are dropped.

- DROP TABLE fktable:
- fktable is dropped with AccessExclusive (expected).
- While removing RI action triggers on pktable, pktable also sees an
AccessExclusive lock, so SELECTs on pktable can be blocked even though
pktable is not being dropped.

- DROP TABLE pktable CASCADE:
- pktable is dropped with AccessExclusive (expected).
- FK removal on fktable (check triggers) also introduces an
AccessExclusive effect on fktable, so SELECTs on fktable can be blocked.

Proposal
I wanted to see if we could reduce the relation-level lock used
specifically for FK/trigger removal from AccessExclusive to
ShareRowExclusive. This keeps readers moving while still blocking writers
during the small window where RI triggers/constraint rows are removed and
relcache is invalidated. Ideally, with a change like this `ALTER TABLE
fktable DROP CONSTRAINT fkname` should take ShareRowExclusive for fktable
and pktable, and `DROP TABLE fktable` should take ShareRowExclusive for
fktable and AccessExclusive for pktable.

Next, dropping fktable would take ShareRowExclusive for fktable and
AccessExclusive for pktable. And DROP TABLE pktable CASCADE would take
AccessExclusive for pktable and ShareRowExclusive for fktable.

My understanding is that the table being dropped still uses
AccessExclusive; reads/writes on that table remain blocked as today while
it updates relcache, updating metadata/catalog entries and other cleanup
tasks.

Also, I believe ShareRowExclusive would still serialize writers, so there
should be no window where DML can bypass enforcement during removal. So
this change should not affect correctness?

I just attached a small patch to get the idea across. If folks think this
direction makes sense, I am happy to work on a more complete patch as well.
If I’ve missed any critical cases that truly need AccessExclusive at the
relation level during FK/trigger removal, especially around partitions,
pending trigger events, hot standby, logical decoding, event triggers or
something else perhaps please let me know.

[1]
/messages/by-id/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel@cybertec.at

Thanks for your time and feedback.
Shayon

Tom Lane

tgl@sss.pgh.pa.us

6 months ago

In reply to: Shayon Mukherjee (#2)

Re: [PATCH] Proposal: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Shayon Mukherjee <shayonj@gmail.com> writes:

Following up on the previous thread - I took a stab at trying to see what a
full patch for the proposal to reduce lock levels during FK/trigger drops
would look like, and this is what I ended up with.

I don't think this is safe, at least not for FK removal. There's a
comment in AlterTableGetLockLevel explaining why:

/*
* Removing constraints can affect SELECTs that have been
* optimized assuming the constraint holds true. See also
* CloneFkReferenced.
*/

Adding a foreign key can (and I think does) take a lesser lock,
because the additional constraint won't invalidate any proofs the
optimizer may have made beforehand. But dropping one seems
problematic.

Another issue is that the proposed patch looks like it reduces
the locking level for more types of constraints than just FKs.
That would require further analysis, but I believe that (for
example) dropping a unique constraint likewise risks breaking
existing query plans, even when they aren't directly using that
index.

regards, tom lane

Shayon Mukherjee

shayonj@gmail.com

6 months ago

In reply to: Tom Lane (#3)

Re: [PATCH] Proposal: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Thank you so much for the review and feedback Tom

On Wed, Oct 8, 2025 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shayon Mukherjee <shayonj@gmail.com> writes:

Following up on the previous thread - I took a stab at trying to see

what a

full patch for the proposal to reduce lock levels during FK/trigger drops
would look like, and this is what I ended up with.

I don't think this is safe, at least not for FK removal. There's a
comment in AlterTableGetLockLevel explaining why:

/*
* Removing constraints can affect SELECTs that have been
* optimized assuming the constraint holds true. See also
* CloneFkReferenced.
*/

I tried to map my mental model and walk down the code paths, I knew I was
missing something :^). Thank you for pointing this out.

Adding a foreign key can (and I think does) take a lesser lock,
because the additional constraint won't invalidate any proofs the
optimizer may have made beforehand. But dropping one seems
problematic.

You're 100% correct that the table owning the constraint (fktable) needs
AccessExclusive because of the requirement that dropping the FK can
invalidate plans that used it for proofs. That said, I wonder if we can
reliably do something where only the referenced table’s RI action triggers
are weakened to ShareRowExclusive in trigger deletion, which would help us
achieve the criteria that the weaker locks on the referenced table (not the
table being dropped), can accept SELECTs/reads?

Another issue is that the proposed patch looks like it reduces
the locking level for more types of constraints than just FKs.
That would require further analysis, but I believe that (for
example) dropping a unique constraint likewise risks breaking
existing query plans, even when they aren't directly using that
index.

That makes sense, yes. I attempted at an updated patch with reduced locks
such that it's only for FK RI action triggers on the referenced table with
no lock reduction applied to referenced tables for other constraint types.
That is, if my understanding and mental model is right here.

I attached an updated patch with the new discoveries and the changes made
are:

- RemoveConstraintById(): Reverted to AccessExclusiveLock
- RemoveTriggerById(): Now only uses ShareRowExclusiveLock for internal RI
action triggers on the referenced table (confrelid). All other triggers
(user triggers, RI check triggers on the FK table) still use
AccessExclusiveLock.
- dropconstraint_internal(): ShareRowExclusiveLock only for FK drops on the
referenced table (already inside `if CONSTRAINT_FOREIGN` block)
- ATPostAlterTypeCleanup(): Added constraint type check; only FK rebuilds
use ShareRowExclusive

I feel like I could still be missing something, so I really appreciate any
feedback. Definitely not trying to push hard on this and more so just using
this as a learning opportunity as well.

Thank you
Shayon

Proposal for discussions: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Attachments:

Attachments:

Attachments: