GetCachedPlan() refactor: move execution lock acquisition out

Started by Amit Langote3 days ago6 messageshackers

Langote_Amit_f8@lab.ntt.co.jp

3 days ago

Over in the "generic plans and initial pruning" thread [1]/messages/by-id/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com, I came to
think that the cleanest way to address the architectural issue behind
that thread is to make GetCachedPlan() do less execution-related work.
Specifically, it should stop acquiring execution locks itself, and
leave that decision to callers.

GetCachedPlan() has long combined plan acquisition with execution
setup, and that coupling makes it harder to improve either side
cleanly. Tom pointed in this direction back in 2023 [2]/messages/by-id/4191508.1674157166@sss.pgh.pa.us:

"...the whole problem arises because the system is wrongly factored.
We should get rid of AcquireExecutorLocks altogether, allowing the
plancache to hand back a generic plan that it's not certain of the
validity of, and instead integrate the responsibility for acquiring
locks into executor startup."

This patch takes the first half of that suggestion by moving execution
locking out of the plancache and into its callers.

The attached patch does that, with no intended behavioral change:

* Remove AcquireExecutorLocks() from CheckCachedPlan(), so
GetCachedPlan() returns a plan without holding execution locks.

* Add a new internal helper in plancache.c, LockCachedPlan(), which
acquires execution locks, rechecks validity, and returns false if the
plan was invalidated, so the caller can retry.

* Add GetCachedPlanLocked() as a convenience wrapper that fetches a
plan, calls LockCachedPlan(), and retries on invalidation. This
becomes the normal entry point for callers that want a plan ready for
execution.

* Convert existing GetCachedPlan() callers to use GetCachedPlanLocked().

The second half of Tom's suggestion, moving that responsibility into
executor startup, is trickier. ExecutorStart() is a stable API with
extension hooks, so changing its contract has a fairly high bar. I
tried that route once already [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296ca9d.

A working variant that adds an ExecutorPrep() entry point, with a
wrapper implementing pruning-aware locking on top, was discussed in
the original thread [1]/messages/by-id/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com. But rather than propose that whole stack at
once, I think it is better to do this in phases: first decouple plan
acquisition from execution locking, then revisit the ExecutorPrep()
piece separately if this goes in.

The eventual goal is still pruning-aware locking. I'm posting this
piece separately because the original thread has become fairly long,
and this part seems easier to review on its own.

[1]: /messages/by-id/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
[2]: /messages/by-id/4191508.1674157166@sss.pgh.pa.us
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296ca9d

--
Thanks, Amit Langote

Kirill Reshke

reshkekirill@gmail.com

3 days ago

In reply to: Amit Langote (#1)

Re: GetCachedPlan() refactor: move execution lock acquisition out

On Tue, 14 Apr 2026 at 13:20, Amit Langote <amitlangote09@gmail.com> wrote:

Over in the "generic plans and initial pruning" thread [1], I came to
think that the cleanest way to address the architectural issue behind
that thread is to make GetCachedPlan() do less execution-related work.
Specifically, it should stop acquiring execution locks itself, and
leave that decision to callers.

GetCachedPlan() has long combined plan acquisition with execution
setup, and that coupling makes it harder to improve either side
cleanly. Tom pointed in this direction back in 2023 [2]:

"...the whole problem arises because the system is wrongly factored.
We should get rid of AcquireExecutorLocks altogether, allowing the
plancache to hand back a generic plan that it's not certain of the
validity of, and instead integrate the responsibility for acquiring
locks into executor startup."

This patch takes the first half of that suggestion by moving execution
locking out of the plancache and into its callers.

The attached patch does that, with no intended behavioral change:

* Remove AcquireExecutorLocks() from CheckCachedPlan(), so
GetCachedPlan() returns a plan without holding execution locks.

* Add a new internal helper in plancache.c, LockCachedPlan(), which
acquires execution locks, rechecks validity, and returns false if the
plan was invalidated, so the caller can retry.

* Add GetCachedPlanLocked() as a convenience wrapper that fetches a
plan, calls LockCachedPlan(), and retries on invalidation. This
becomes the normal entry point for callers that want a plan ready for
execution.

* Convert existing GetCachedPlan() callers to use GetCachedPlanLocked().

The second half of Tom's suggestion, moving that responsibility into
executor startup, is trickier. ExecutorStart() is a stable API with
extension hooks, so changing its contract has a fairly high bar. I
tried that route once already [3].

A working variant that adds an ExecutorPrep() entry point, with a
wrapper implementing pruning-aware locking on top, was discussed in
the original thread [1]. But rather than propose that whole stack at
once, I think it is better to do this in phases: first decouple plan
acquisition from execution locking, then revisit the ExecutorPrep()
piece separately if this goes in.

The eventual goal is still pruning-aware locking. I'm posting this
piece separately because the original thread has become fairly long,
and this part seems easier to review on its own.

[1] /messages/by-id/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
[2] /messages/by-id/4191508.1674157166@sss.pgh.pa.us
[3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296ca9d

--
Thanks, Amit Langote

Hi! I have slightly checked v1. Can't tell if refactoring does make
cached plan maintenance more straightforward because of lack of
experience in this area, but here's my 2c

+
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv);
+ if (LockCachedPlan(cplan))
+ break;
+
+ ReleaseCachedPlan(cplan, owner);
+ }

Should we use CFI here?

+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+ AcquireExecutorLocks(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocks(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}

simply `return cplan->is_valid ` would be more Postgres-y here, isnt it?

--
Best regards,
Kirill Reshke

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

2 days ago

In reply to: Kirill Reshke (#2)

Re: GetCachedPlan() refactor: move execution lock acquisition out

Hi,

On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 14 Apr 2026 at 13:20, Amit Langote <amitlangote09@gmail.com> wrote:

Over in the "generic plans and initial pruning" thread [1], I came to
think that the cleanest way to address the architectural issue behind
that thread is to make GetCachedPlan() do less execution-related work.
Specifically, it should stop acquiring execution locks itself, and
leave that decision to callers.

GetCachedPlan() has long combined plan acquisition with execution
setup, and that coupling makes it harder to improve either side
cleanly. Tom pointed in this direction back in 2023 [2]:

"...the whole problem arises because the system is wrongly factored.
We should get rid of AcquireExecutorLocks altogether, allowing the
plancache to hand back a generic plan that it's not certain of the
validity of, and instead integrate the responsibility for acquiring
locks into executor startup."

This patch takes the first half of that suggestion by moving execution
locking out of the plancache and into its callers.

The attached patch does that, with no intended behavioral change:

* Remove AcquireExecutorLocks() from CheckCachedPlan(), so
GetCachedPlan() returns a plan without holding execution locks.

* Add a new internal helper in plancache.c, LockCachedPlan(), which
acquires execution locks, rechecks validity, and returns false if the
plan was invalidated, so the caller can retry.

* Add GetCachedPlanLocked() as a convenience wrapper that fetches a
plan, calls LockCachedPlan(), and retries on invalidation. This
becomes the normal entry point for callers that want a plan ready for
execution.

* Convert existing GetCachedPlan() callers to use GetCachedPlanLocked().

The second half of Tom's suggestion, moving that responsibility into
executor startup, is trickier. ExecutorStart() is a stable API with
extension hooks, so changing its contract has a fairly high bar. I
tried that route once already [3].

A working variant that adds an ExecutorPrep() entry point, with a
wrapper implementing pruning-aware locking on top, was discussed in
the original thread [1]. But rather than propose that whole stack at
once, I think it is better to do this in phases: first decouple plan
acquisition from execution locking, then revisit the ExecutorPrep()
piece separately if this goes in.

The eventual goal is still pruning-aware locking. I'm posting this
piece separately because the original thread has become fairly long,
and this part seems easier to review on its own.

Hi! I have slightly checked v1. Can't tell if refactoring does make
cached plan maintenance more straightforward because of lack of
experience in this area, but here's my 2c

Thanks for taking a look.

Could you say a bit more about what aspect of cached plan maintenance
you have in mind? My main motivation here is to separate plan
acquisition from execution-time locking, so that locking policy no
longer has to live inside GetCachedPlan().

+
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv);
+ if (LockCachedPlan(cplan))
+ break;
+
+ ReleaseCachedPlan(cplan, owner);
+ }

Should we use CFI here?

Good point. On a closer look, I think the retry loop can go away
entirely. If LockCachedPlan() fails, releasing the plan and calling
GetCachedPlan() again is enough, because that will rebuild the plan
and the needed locks will already be held by the planner. I'll
simplify that in the next version.

+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+ AcquireExecutorLocks(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocks(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}

simply `return cplan->is_valid ` would be more Postgres-y here, isnt it?

Agreed, will fix too.

I'll hold off on posting a v2 for now.

--
Thanks, Amit Langote

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

2 days ago

In reply to: Amit Langote (#3)

Re: GetCachedPlan() refactor: move execution lock acquisition out

On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+ AcquireExecutorLocks(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocks(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
simply `return cplan->is_valid ` would be more Postgres-y here, isnt it?
Agreed, will fix too.

Thinking more about this bit, checking only cplan->is_valid after
locking is not really equivalent to what CheckCachedPlan() was doing
before.

CheckCachedPlan() can also reset plan->is_valid based on other state
(role, xmin), whereas in v1 the new check relied only on lock inval
callbacks to have set is_valid to false. That means the new check is
not enough. It may happen to be okay for the current callers in this
patch, but it is not something I think is safe to rely on once there
are future callers that do arbitrary work, such as ExecutorPrep(),
even if carefully coded, between the original GetCachedPlan() /
CheckCachedPlan() step and the later recheck after locking.

Separately, I also realized that v1 was introducing redundant lock
acquisition for freshly built plans, which is not really a no behavior
change refactoring either.

So in v2 I ended up reworking two parts more substantially:

* The patch now factors the reused-generic-plan validity logic into
RecheckCachedPlan().

* It also adds an is_reused output argument to GetCachedPlan(), so
callers can distinguish the reused-plan case from a freshly built plan
and avoid the extra lock step in the latter case.

--
Thanks, Amit Langote

Chao Li

li.evan.chao@gmail.com

1 day ago

In reply to: Amit Langote (#4)

Re: GetCachedPlan() refactor: move execution lock acquisition out

On Apr 15, 2026, at 21:46, Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+ AcquireExecutorLocks(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocks(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
simply `return cplan->is_valid ` would be more Postgres-y here, isnt it?
Agreed, will fix too.
Thinking more about this bit, checking only cplan->is_valid after
locking is not really equivalent to what CheckCachedPlan() was doing
before.

CheckCachedPlan() can also reset plan->is_valid based on other state
(role, xmin), whereas in v1 the new check relied only on lock inval
callbacks to have set is_valid to false. That means the new check is
not enough. It may happen to be okay for the current callers in this
patch, but it is not something I think is safe to rely on once there
are future callers that do arbitrary work, such as ExecutorPrep(),
even if carefully coded, between the original GetCachedPlan() /
CheckCachedPlan() step and the later recheck after locking.

Separately, I also realized that v1 was introducing redundant lock
acquisition for freshly built plans, which is not really a no behavior
change refactoring either.

So in v2 I ended up reworking two parts more substantially:

* The patch now factors the reused-generic-plan validity logic into
RecheckCachedPlan().

* It also adds an is_reused output argument to GetCachedPlan(), so
callers can distinguish the reused-plan case from a freshly built plan
and avoid the extra lock step in the latter case.

--
Thanks, Amit Langote
<v2-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch>

V2 overall looks good, it preserves the current behavior.

I just have a question. RecheckCachedPlan is only called by LockCachedPlan and CheckCachedPlan, and both callers are static, why RecheckCachedPlan is exported?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

1 day ago

In reply to: Chao Li (#5)

Re: GetCachedPlan() refactor: move execution lock acquisition out

Hi,

On Thu, Apr 16, 2026 at 11:35 AM Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 15, 2026, at 21:46, Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+ AcquireExecutorLocks(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocks(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
simply `return cplan->is_valid ` would be more Postgres-y here, isnt it?
Agreed, will fix too.
Thinking more about this bit, checking only cplan->is_valid after
locking is not really equivalent to what CheckCachedPlan() was doing
before.

CheckCachedPlan() can also reset plan->is_valid based on other state
(role, xmin), whereas in v1 the new check relied only on lock inval
callbacks to have set is_valid to false. That means the new check is
not enough. It may happen to be okay for the current callers in this
patch, but it is not something I think is safe to rely on once there
are future callers that do arbitrary work, such as ExecutorPrep(),
even if carefully coded, between the original GetCachedPlan() /
CheckCachedPlan() step and the later recheck after locking.

Separately, I also realized that v1 was introducing redundant lock
acquisition for freshly built plans, which is not really a no behavior
change refactoring either.

So in v2 I ended up reworking two parts more substantially:

* The patch now factors the reused-generic-plan validity logic into
RecheckCachedPlan().

* It also adds an is_reused output argument to GetCachedPlan(), so
callers can distinguish the reused-plan case from a freshly built plan
and avoid the extra lock step in the latter case.
V2 overall looks good, it preserves the current behavior.

I just have a question. RecheckCachedPlan is only called by LockCachedPlan and CheckCachedPlan, and both callers are static, why RecheckCachedPlan is exported?

Oh, I thought someone would ask that. Yes, it could stay static in
plancache.c for now and only be exported later, when some future
smarter cached-plan locker outside plancache.c actually needs it.

--
Thanks, Amit Langote

GetCachedPlan() refactor: move execution lock acquisition out

Attachments:

Attachments: