Plan caching and serialization for reuse across executions

Started by Xuan Chen5 months ago7 messages

cx0222@vip.qq.com

5 months ago

Hi hackers,

I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it across later executions.

I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are tied to prepared statements and session-level usage. My interest is more on the research side:

- Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future execution, bypassing the planner?
- If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization, or are there known architectural reasons why this approach is discouraged?
- Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful.

The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead and study trade-offs of reusing cached plans.

Thanks a lot for your guidance!

Best regards,
Xuan Chen

Kirill Reshke

reshkekirill@gmail.com

5 months ago

In reply to: Xuan Chen (#1)

Re: Plan caching and serialization for reuse across executions

Hi!

On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote:

- Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful.

Here is extension implementing saving and restoring plans[0]https://github.com/postgrespro/sr_plan

[0]: https://github.com/postgrespro/sr_plan

--
Best regards,
Kirill Reshke

Andrei Lepikhov

lepihov@gmail.com

5 months ago

In reply to: Kirill Reshke (#2)

Re: Plan caching and serialization for reuse across executions

On 18/8/2025 13:50, Kirill Reshke wrote:

Hi!

On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote:

- Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful.

Here is extension implementing saving and restoring plans[0]

[0]https://github.com/postgrespro/sr_plan

I wouldn't recommend this obsolete version of the extension. Since the
QueryId was introduced, we have redesigned it extensively, with the
basic idea that QueryId serves as a 'query plan class' and that matching
parse trees provide a proof of matching incoming queries with their
corresponding [parameterised] plans.

--
regards, Andrei Lepikhov

Andrei Lepikhov

lepihov@gmail.com

5 months ago

In reply to: Xuan Chen (#1)

Re: Plan caching and serialization for reuse across executions

On 16/8/2025 14:32, Xuan Chen wrote:

Hi hackers,

I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it across later executions.

I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are tied to prepared statements and session-level usage. My interest is more on the research side:

- Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future execution, bypassing the planner?
- If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization, or are there known architectural reasons why this approach is discouraged?
- Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful.

The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead and study trade-offs of reusing cached plans.

Thanks a lot for your guidance!

Yes, it is possible. See [1]https://github.com/danolivo/conf/blob/main/2023-PGDay-Israel/sr-plan.pdf to find out how it is implemented in the
Postgres architecture.
We implemented 'freezing' of a statement with parameterisation defined
by the user. Each incoming query (constant or parameterised) matches the
QueryId. Then, with matching incoming and the stored parse tree, it
finds the corresponding plan and proves that it may be used for the
query (remember, rewriting rules, indexes, and other factors may change).
Such a 'frozen' plan is serialised into the shared memory and is lazily
transferred to all backends and used across all instances.
There are lots of issues that still exist for me after the finish of
this project. So, you have a whole room of problems that you can address
in your research ;).

[1]: https://github.com/danolivo/conf/blob/main/2023-PGDay-Israel/sr-plan.pdf

--
regards, Andrei Lepikhov

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Andrei Lepikhov (#3)

Re: Plan caching and serialization for reuse across executions

Andrei Lepikhov <lepihov@gmail.com> writes:

I wouldn't recommend this obsolete version of the extension. Since the
QueryId was introduced, we have redesigned it extensively, with the
basic idea that QueryId serves as a 'query plan class' and that matching
parse trees provide a proof of matching incoming queries with their
corresponding [parameterised] plans.

That's fairly scary, considering all the work that's been done to
intentionally fuzz different queries together to make them look
like "the same query" to pg_stat_statements. While that's fine
for pg_stat_statements, it makes it really questionable to assume
that QueryId is a sufficient proof of semantic equivalence.

(I recall having objected to the whole concept of moving queryjumble.c
into core on the grounds that it could not serve multiple masters.
I think this proves my point...)

regards, tom lane

Julien Rouhaud

rjuju123@gmail.com

5 months ago

In reply to: Tom Lane (#5)

Re: Plan caching and serialization for reuse across executions

Hi,

On Mon, Aug 18, 2025 at 10:18:34AM -0400, Tom Lane wrote:

Andrei Lepikhov <lepihov@gmail.com> writes:

I wouldn't recommend this obsolete version of the extension. Since the
QueryId was introduced, we have redesigned it extensively, with the
basic idea that QueryId serves as a 'query plan class' and that matching
parse trees provide a proof of matching incoming queries with their
corresponding [parameterised] plans.

That's fairly scary, considering all the work that's been done to
intentionally fuzz different queries together to make them look
like "the same query" to pg_stat_statements. While that's fine
for pg_stat_statements, it makes it really questionable to assume
that QueryId is a sufficient proof of semantic equivalence.

(I recall having objected to the whole concept of moving queryjumble.c
into core on the grounds that it could not serve multiple masters.
I think this proves my point...)

I don't think it really proves your point, the queryid is only a hash and
collisions are possible. So even if the core jumbling didn't fuzz different
queries together you would still need a secondary check to ensure correctness,
and IIUC Andrei said that they use the actual parsetree for that. I wrote a
prototype of extension to implement a shared plan cache some years ago and I
also did something similar. Even if the queryid is not (and cannot be) perfect
it's still highly useful in many situations.

Andrei Lepikhov

lepihov@gmail.com

5 months ago

In reply to: Tom Lane (#5)

Re: Plan caching and serialization for reuse across executions

On 18/8/2025 16:18, Tom Lane wrote:

Andrei Lepikhov <lepihov@gmail.com> writes:

I wouldn't recommend this obsolete version of the extension. Since the
QueryId was introduced, we have redesigned it extensively, with the
basic idea that QueryId serves as a 'query plan class' and that matching
parse trees provide a proof of matching incoming queries with their
corresponding [parameterised] plans.

That's fairly scary, considering all the work that's been done to
intentionally fuzz different queries together to make them look
like "the same query" to pg_stat_statements. While that's fine
for pg_stat_statements, it makes it really questionable to assume
that QueryId is a sufficient proof of semantic equivalence.

(I recall having objected to the whole concept of moving queryjumble.c
into core on the grounds that it could not serve multiple masters.
I think this proves my point...)

I recall that this feature is on the "we never want it" list. It was
designed using a core patch and includes a custom, more selective query
ID. That's why I recently tried to discuss the extensibility of query
IDs in several threads.
As mentioned earlier, the query ID is utilised to narrow the search
space, resulting in no more than two to three additional costly checks,
according to my research. This allows the feature to be both practical
and efficient. The accuracy of the proof is determined by a parse tree
match, along with a few additional checks on the existence of indexes
and stuff like that.
I have never considered contributing any part of this to the core, just
as the pg_hint_plan does.

--
regards, Andrei Lepikhov