Proposal - Allow extensions to set a Plan Identifier
Hi,
A patch by Lukas Fittl [1]/messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com introduces the pg_stat_plans extension, which
exposes execution plans along with execution statistics. As part of this work,
a new infrastructure is being proposed to compute a plan identifier (planId),
which is exposed in both pg_stat_plans and pg_stat_activity.
Exposing planId in pg_stat_activity enables users to identify long-running
or high-load plans and correlate them with plan text from the extension.
One of open question in [1]/messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com is which components of a plan tree should be
considered when computing planId. Lukas has created a wiki [2]/messages/by-id/CAP53PkxocbNr+eRag3FEJp3-7S1U80FspOg8UQjO902TWMG=6A@mail.gmail.com for discussion,
but reaching a consensus may take time—possibly in time for v19, but
it's uncertain.
Meanwhile, existing extensions like pg_stat_monitor [3]https://github.com/percona/pg_stat_monitor compute a planId and
store the plan text, but they lack a way to expose planId in pg_stat_activity.
This limits their usefulness, as identifying top or long-running plans from
pg_stat_activity is critical for monitoring.
To address this, I propose leveraging work from [1]/messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com and allow extensions
to set a planId This could be implemented sooner than an in-core
computed planId.
Proposal Overview:
1/ Add a planId field to PgBackendStatus.
2/ Add a planId field to PlannedStmt.
3/ Create APIs for extensions to set the current planId.
Storing planId in PlannedStmt ensures it is available with
cached plans.
One consideration is whether reserving a 64-bit integer in PgBackendStatus
and PlannedStmt is acceptable, given that planId may not always be used.
However, this is already the case for queryId when compute_query_id is
disabled and no extension sets it, so it may not be a concern.
Looking forward to feedback on this approach.
If there is agreement, I will work on preparing the patches.
Regards,
Sami Imseih
Amazon Web Services (AWS)
[1]: /messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com
[2]: /messages/by-id/CAP53PkxocbNr+eRag3FEJp3-7S1U80FspOg8UQjO902TWMG=6A@mail.gmail.com
[3]: https://github.com/percona/pg_stat_monitor
Hi,
On Wed, Feb 12, 2025 at 05:44:20PM -0600, Sami Imseih wrote:
Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and
store the plan text, but they lack a way to expose planId in pg_stat_activity.
This limits their usefulness, as identifying top or long-running plans from
pg_stat_activity is critical for monitoring.
Agree.
To address this, I propose leveraging work from [1] and allow extensions
to set a planId This could be implemented sooner than an in-core
computed planId.
I think that makes sense and then the idea would be later on to move to something
like 5fd9dfa5f50, but for the "planId": is my understanding correct?
Proposal Overview:
1/ Add a planId field to PgBackendStatus.
2/ Add a planId field to PlannedStmt.
3/ Create APIs for extensions to set the current planId.
Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
through some hooks). But did not look close enough and would probably depends
of the implementation, so let's see.
Storing planId in PlannedStmt ensures it is available with
cached plans.One consideration is whether reserving a 64-bit integer in PgBackendStatus
and PlannedStmt is acceptable, given that planId may not always be used.
From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
is 152 bytes. It looks like that adding an uint64 at the end of those structs
would not add extra padding (that's what "ptype /o struct <struct>" tells me)
so would increase to 440 bytes and 160 bytes respectively. I don't think that's
an issue.
However, this is already the case for queryId when compute_query_id is
disabled and no extension sets it
Yup.
Looking forward to feedback on this approach.
If there is agreement, I will work on preparing the patches.
That sounds like a plan for me but better to wait for others to share their
thoughts too.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Thanks for the feedback!
I think that makes sense and then the idea would be later on to move to something
like 5fd9dfa5f50, but for the "planId": is my understanding correct?
correct. This is adding infrastructure to eventually have an in-core
planId; but in the
meanwhile give extensions that already compute a planId the ability to
broadcast the planId in pg_stat_activity.
Proposal Overview:
1/ Add a planId field to PgBackendStatus.
2/ Add a planId field to PlannedStmt.
3/ Create APIs for extensions to set the current planId.Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
through some hooks). But did not look close enough and would probably depends
of the implementation, so let's see.
I don't think direct setting of values is a good idea. We will need an API
similar to pgstat_report_query_id which ensures we are only reporting top
level planIds -and- in the case of multiple extensions with the capability
to set a planId, only the first one in the stack wins. pgstat_report_query_id
does allow for forcing a queryId ( force flag which is false by default ), and
I think this new API should allow the same.
Storing planId in PlannedStmt ensures it is available with
cached plans.One consideration is whether reserving a 64-bit integer in PgBackendStatus
and PlannedStmt is acceptable, given that planId may not always be used.From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
is 152 bytes. It looks like that adding an uint64 at the end of those structs
would not add extra padding (that's what "ptype /o struct <struct>" tells me)
so would increase to 440 bytes and 160 bytes respectively. I don't think that's
an issue.
Correct, thanks for adding this detail.
--
Sami
On Thu, Feb 13, 2025 at 11:10:27AM -0600, Sami Imseih wrote:
I don't think direct setting of values is a good idea. We will need an API
similar to pgstat_report_query_id which ensures we are only reporting top
level planIds -and- in the case of multiple extensions with the capability
to set a planId, only the first one in the stack wins. pgstat_report_query_id
does allow for forcing a queryId ( force flag which is false by default ), and
I think this new API should allow the same.
Yep. The main idea behind that is to offer to extensions one single
and consistent way to set this kind of data. In short, we have use
cases where we want to be able to associate an ID with a plan tree,
we're not sure which algorithm would be the best one to use in core.
One thing that we are sure about is that there is no easy way to
figure out through the planner hook across multiple extensions if this
has been set with an in-core structure. Sure, you can do that with a
shmem area associated to one of the extensions, but it seems just
easier to plug in a set/get logic into core, including a way to force
setting the ID?
The key point to me is to add that to the backend entry structure and
a plan node structure, so as it is possible to track that through the
planner hook for the planner structure. The backend entry is for
monitoring, not really pg_stat_activity first.
I'm obviously siding with this proposal because we have an ask to
track this kind of data, and because this kind of data is kind of hard
to track across a stack of extensions through the core backend code.
Point is, would others be interested in this addition or just object
to it because it touches the core code?
--
Michael
On 14/2/2025 08:21, Michael Paquier wrote:
On Thu, Feb 13, 2025 at 11:10:27AM -0600, Sami Imseih wrote:
I don't think direct setting of values is a good idea. We will need an API
similar to pgstat_report_query_id which ensures we are only reporting top
level planIds -and- in the case of multiple extensions with the capability
to set a planId, only the first one in the stack wins. pgstat_report_query_id
does allow for forcing a queryId ( force flag which is false by default ), and
I think this new API should allow the same.I'm obviously siding with this proposal because we have an ask to
track this kind of data, and because this kind of data is kind of hard
to track across a stack of extensions through the core backend code.Point is, would others be interested in this addition or just object
to it because it touches the core code?
I have already implemented it twice in different ways as a core patch.
In my projects, we need to track queryId and plan node ID for two reasons:
1. Optimisational decisions made during transformation/path generation
stages up to the end of execution to correct them in the future.
2. Cache information about the query tree/node state to use it for
statistical purposes.
In my experience, we don't need a single plan_id field; we just need an
'extended list' pointer at the end of the Plan, PlannedStmt, Query, and
RelOptInfo structures and a hook at the end of the create_plan_recurse()
to allow passing some info from the path generator to the plan tree.
An extension may add its data to the list (we may register an extensible
node type to be sure we don't interfere with other extensions) and
manipulate it in a custom way and with custom UI.
Generally, it makes the optimiser internals more open to extensions.
--
regards, Andrei Lepikhov
On Sat, Feb 15, 2025 at 10:29:41AM +0100, Andrei Lepikhov wrote:
I have already implemented it twice in different ways as a core patch.
In my projects, we need to track queryId and plan node ID for two reasons:
Are these available in the public somewhere or is that simply what
Sami is proposing?
1. Optimisational decisions made during transformation/path generation
stages up to the end of execution to correct them in the future.
2. Cache information about the query tree/node state to use it for
statistical purposes.
Gathering of statistical data based on a node tree is one reason,
where it may or may not be required to walk through a path.
Influencing the plan used with an already-generated one (where hints
could be used) was the second one, mostly replacing a plan in the
planner hook. Influencing the paths in a plan or a subplan didn't
really matter much with hints to drive the paths.
In my experience, we don't need a single plan_id field; we just need an
'extended list' pointer at the end of the Plan, PlannedStmt, Query, and
RelOptInfo structures and a hook at the end of the create_plan_recurse() to
allow passing some info from the path generator to the plan tree.
An extension may add its data to the list (we may register an extensible
node type to be sure we don't interfere with other extensions) and
manipulate it in a custom way and with custom UI.
Generally, it makes the optimiser internals more open to extensions.
Sounds to me that this maps with the addition of a "private" area to
some of the plan structures to allow extensions to attach some data
that would be reused elsewhere, which is rather independent than
what's suggested here? These kinds of private areas could be to a
list, or even a different structure, depending on your extension
design.
--
Michael
On Sun, Feb 16, 2025 at 5:34 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Feb 15, 2025 at 10:29:41AM +0100, Andrei Lepikhov wrote:
I have already implemented it twice in different ways as a core patch.
In my projects, we need to track queryId and plan node ID for two reasons:Are these available in the public somewhere or is that simply what
Sami is proposing?
Andrei, you mention "plan node ID" which if I understand correctly will be
a good thing to expose to determine which part of the plan most of the time
is spent. This is similar to an idea I raised in a different thread for
explain plan progress [1]/messages/by-id/CAA5RZ0uGDKWxqUCMrsWKV425T2f6mqJsXKg6chq+WuyCwNPUGw@mail.gmail.com..
However, I am proposing something different, which is we track a plan_id of the
full execution plan. Some extensions may hash the text version of the plan,
others may choose to do something more elaborate such as "jumbling" a plan
tree. The point becomes is that monitoring extensions such as
pg_stat_monitor and
others can set the plan_id in core so it's available in backend status.
1. Optimisational decisions made during transformation/path generation
stages up to the end of execution to correct them in the future.
2. Cache information about the query tree/node state to use it for
statistical purposes.Gathering of statistical data based on a node tree is one reason,
where it may or may not be required to walk through a path.
Influencing the plan used with an already-generated one (where hints
could be used) was the second one, mostly replacing a plan in the
planner hook. Influencing the paths in a plan or a subplan didn't
really matter much with hints to drive the paths.In my experience, we don't need a single plan_id field; we just need an
'extended list' pointer at the end of the Plan, PlannedStmt, Query, and
RelOptInfo structures and a hook at the end of the create_plan_recurse() to
allow passing some info from the path generator to the plan tree.
An extension may add its data to the list (we may register an extensible
node type to be sure we don't interfere with other extensions) and
manipulate it in a custom way and with custom UI.
Generally, it makes the optimiser internals more open to extensions.Sounds to me that this maps with the addition of a "private" area to
some of the plan structures to allow extensions to attach some data
that would be reused elsewhere, which is rather independent than
what's suggested here?
+1, such a private area is different from what is being proposed.
[1]: /messages/by-id/CAA5RZ0uGDKWxqUCMrsWKV425T2f6mqJsXKg6chq+WuyCwNPUGw@mail.gmail.com.
--
Sami
I put together patches to do as is being proposed.
v1-0001:
1. Adds a planId field in PlannedStmt
2. Added an st_plan_id fields in PgBackendStatus
3. APIs to report and to retrieve a planId to PgBackendStatus
An extension is able to set a planId in PlannedStmt directly,
and while they can do the same for PgBackendStatus, I felt it is better
to provide similar APIs for this as we do for queryId. Unless the extension
passed force=true to the API, this will ensure that a planId already set
either by a top-level statement or by another extension cannot be
set when a plan is active.
Also, the extension does not need to worry about resetting the planId at
the end of execution as this will follow the same mechanism as is currently
being done for queryId. This will remove responsibility from the extension
author to have to manage this aspect.
The extension should normally compute the planId in the planner or ExecutorStart
hook. If the planId is computed in the planner_hook, the extension can set the
planId in plannedStmt making the planId available to subsequent executions of
a cached plan.
What this patch does not cover is adding a "Plan Identifier" to explain output
or to logs. Also, there is no core GUC like compute_query_id, since it does not
make sense if we're not computing a planId in core.
v2-0001:
This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.
Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.
These 2 patches can probably be combined , but will leave them like
this for now.
Looking forward to feedback.
Regards
Sami
Attachments:
v1-0002-add-planId-to-pg_stat_get_activity.patchapplication/octet-stream; name=v1-0002-add-planId-to-pg_stat_get_activity.patchDownload+14-9
v1-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patchapplication/octet-stream; name=v1-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patchDownload+68-1
Hi,
Thank you for the patches. They're very important for many extensions.
I've debugged them using simple extensions pg_stat_statements and
auto_explain, specifically checking cases involving PlannedStmt
and pg_stat_get_activity - , and haven't encountered any issues so far.
However, I have a question: what value should planid have when we
execute the standard planner without using a planner_hook? For example,
if pg_stat_statementsis disabled, would planid default to zero? If zero
is the expected default, should we explicitly write this behavior?
result->planid = 0;
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
On 18.03.2025 14:46, Ilia Evdokimov wrote:
However, I have a question: what value should planid have when we
execute the standard planner without using a planner_hook? For
example, if pg_stat_statementsis disabled, would planid default to
zero? If zero is the expected default, should we explicitly write this
behavior?result->planid = 0;
Sorry for spam. I found the answer this question in thread [0]/messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com.
[0]: /messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com
/messages/by-id/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
On Thu, Feb 20, 2025 at 05:03:19PM -0600, Sami Imseih wrote:
v2-0001:
This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.These 2 patches can probably be combined , but will leave them like
this for now.
The split makes more sense to me, and FWIW I see more value in 0001
that provides an anchor on the backend side for extensions to plug in
a plan ID that can be tracked across multiple layers. I am less
convinced about the need for 0002 that adds this information to
pg_stat_activity at this stage, as the getter routine in 0001 is
enough to know what's happening, roughly.
+void
+pgstat_report_plan_id(uint64 plan_id, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ return;
+
+ if (beentry->st_plan_id != 0 && !force)
+ return;
Right, and that's in line with the cases I've seen because we wanted
to be able to optionally control if only the top-level plan ID should
be stored or not, which is useful when going through the planner hook
at some sub-level. Hmm. Shouldn't we document all such expectations,
which are similar to a query ID? Our experience with pgss in this
area offers some insights, which I guess folks are rather used to and
could be applied to plan IDs.
In short, it would be nice to do 0001 for this release and start from
that (with more documentation around the expectations of these
interfaces). For 0002 we could always see later about exposing it in
a system view, even if it would offer the advantage of more
consistency across a single call of
pgstat_get_local_beentry_by_index() for each backend entry.
--
Michael
This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.These 2 patches can probably be combined , but will leave them like
this for now.The split makes more sense to me, and FWIW I see more value in 0001
that provides an anchor on the backend side for extensions to plug in
a plan ID that can be tracked across multiple layers.
Yes, I think this is a step in the right direction for this functionality.
I am less
convinced about the need for 0002 that adds this information to
pg_stat_activity at this stage, as the getter routine in 0001 is
enough to know what's happening, roughly.
I agree. I don't think exposing this in a system view is required at this
time. Once extensions start to use it, and it becomes obvious it should
be exposed in a system view, that discussion can be had at that
time.
Hmm. Shouldn't we document all such expectations,
which are similar to a query ID?
I had a lazy comment :)
* For the same reasons as the query identifiers (see above),
but, I went ahead and commented it similar to how we document
pgstat_report_query_id and pgstat_get_my_query_id routines.
attached is v2-0001
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v2-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patchapplication/x-patch; name=v2-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patchDownload+90-1
On Thu, Mar 20, 2025 at 09:46:55PM -0400, Sami Imseih wrote:
* For the same reasons as the query identifiers (see above),
but, I went ahead and commented it similar to how we document
pgstat_report_query_id and pgstat_get_my_query_id routines.
attached is v2-0001
Looks mostly OK from here.. Except for two things.
@@ -2016,6 +2017,17 @@ exec_bind_message(StringInfo input_message)
*/
cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+
+ if (plan->planId != UINT64CONST(0))
+ {
+ pgstat_report_plan_id(plan->planId, false);
+ break;
+ }
+ }
In exec_bind_message(), the comment at the top of PortalDefineQuery()
tells to not put any code between this call and the GetCachedPlan()
that could issue an error. pgstat_report_plan_id() is OK, but I'd
rather play it safe and set the ID once the portal is defined, based
on portal->stmts instead. That's the same as your patch, but slightly
safer in the long-term, especially if pgstat_report_plan_id() is
twisted in such a way that it introduces a path where it ERRORs.
planner() is the sole place in the core code where the planner hook
can be called. Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query? At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.
bind and execute message paths are OK as they stand, where we set a
plan ID once their portal is defined from its planned statements.
With some adjustments to some comments and the surroundings of the
code, I get the attached. What do you think?
--
Michael
Attachments:
v3-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patchtext/x-diff; charset=us-asciiDownload+95-1
In exec_bind_message(), the comment at the top of PortalDefineQuery()
tells to not put any code between this call and the GetCachedPlan()
that could issue an error. pgstat_report_plan_id() is OK, but I'd
rather play it safe and set the ID once the portal is defined, based
on portal->stmts instead. That's the same as your patch, but slightly
safer in the long-term, especially if pgstat_report_plan_id() is
twisted in such a way that it introduces a path where it ERRORs.
Yes that makes sense. As long as we report plan_id before
PortalStart, for obvious reasons.
planner() is the sole place in the core code where the planner hook
can be called. Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query? At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.
I agree. I was just thinking we rely on the exec_ routines to report the plan_id
at the start. But, besides the good reason you give, reporting
(slightly) earlier is
better for monitoring tools; as it reduces the likelihood they find an empty
plan_id.
Overall, v3 LGTM
--
Sami Imseih
Amazon Web Services (AWS)
On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:
planner() is the sole place in the core code where the planner hook
can be called. Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query? At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.I agree. I was just thinking we rely on the exec_ routines to report the plan_id
at the start. But, besides the good reason you give, reporting
(slightly) earlier is
better for monitoring tools; as it reduces the likelihood they find an empty
plan_id.
Yep. pgstat_report_plan_id() is not something that extensions should
do, but they should report the planId in the PlannedStmt and let the
backend do the rest.
Overall, v3 LGTM
Thanks. If anybody has any objections and/or comments, now would be a
good time. I'll revisit this thread at the beginning of next week.
--
Michael
On 3/22/25 06:48, Michael Paquier wrote:
On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:
planner() is the sole place in the core code where the planner hook
can be called. Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query? At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.I agree. I was just thinking we rely on the exec_ routines to report the plan_id
at the start. But, besides the good reason you give, reporting
(slightly) earlier is
better for monitoring tools; as it reduces the likelihood they find an empty
plan_id.Yep. pgstat_report_plan_id() is not something that extensions should
do, but they should report the planId in the PlannedStmt and let the
backend do the rest.Overall, v3 LGTM
Thanks. If anybody has any objections and/or comments, now would be a
good time. I'll revisit this thread at the beginning of next week.
The last time I wrote the email, I mistakenly thought about
plan_node_id. I apologize for that.
planId actually looks less controversial than queryId or plan_node_id.
At the same time, it is not free from the different logic that
extensions may incorporate into this value: I can imagine, for example,
the attempt of uniquely numbering plans related to the same queryId,
plain hashing of the plan tree, hashing without constants, etc.
So, it seems that extensions may conflict with the same field. Are we
sure that will happen if they are loaded in different orders in
neighbouring backends?
I'm not very good at stat collector guts, but would it be possible to
allow extensions to report their state in standard tuple format? In that
case, doubts about queryId or planId may be resolved.
--
regards, Andrei Lepikhov
On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
planId actually looks less controversial than queryId or plan_node_id. At
the same time, it is not free from the different logic that extensions may
incorporate into this value: I can imagine, for example, the attempt of
uniquely numbering plans related to the same queryId, plain hashing of the
plan tree, hashing without constants, etc.
One plan ID should have one single query ID, but the opposite may be
not necessarily true: a query ID could be associated with multiple
plan patterns (aka multiple plan IDs). What this offers is that we
know about which plan one query is currently using in live, for a
given query ID.
So, it seems that extensions may conflict with the same field. Are we sure
that will happen if they are loaded in different orders in neighbouring
backends?
Depends on what kind of computation once wants to do, and surely
without some coordination across the extensions you are using these
cannot be shared, but it's no different from the concept of a query
ID. The use cases I've seen where this field is useful is when
splitting code that uses the planner hook for several purposes across
more than one extension. Three of them which are independent, still
want to treat know about a plan ID:
- Set/get an existing node tree plan based on a specific ID.
- Hash calculation of a tree (like Lukas proposal).
- Statistics related to the plan tree used (aka custom cumulative
stats play here).
I'm not very good at stat collector guts, but would it be possible to allow
extensions to report their state in standard tuple format? In that case,
doubts about queryId or planId may be resolved.
I am not exactly sure what you mean here. We are only going to have
one plan ID set for each query execution. Any stats plan data related
to a plan ID surely needs to be upper-bounded if you don't want to
bloat pgstats, with the query ID of the query string it relates to
stored in it and the plan ID used as hash key, but it does not change
that we're only going to always have one single plan ID in a
PlannedStmt or in a backend entry doing a query execution, like a
query ID.
--
Michael
On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
planId actually looks less controversial than queryId or plan_node_id. At
the same time, it is not free from the different logic that extensions may
incorporate into this value: I can imagine, for example, the attempt of
uniquely numbering plans related to the same queryId, plain hashing of the
plan tree, hashing without constants, etc.
I think plan_node_id is probably the least controversial because that value
comes straight from core, and different extensions cannot have their own
interpretation of what that value could be.
Computing a plan_id is even more open for interpretation than it is
for query_id perhaps, which is why giving this ability to extensions
will be useful. Different extensions may choose to compute a plan_id
different ways, but they all should be able to push this value into core,
and this is what this patch will allow for.
FWIW, Lukas did start a Wiki [0]https://wiki.postgresql.org/wiki/Plan_ID_Jumbling to open the discussion for what parts
of the plan should be used to compute a plan_id, and maybe we can
in the future compite a plan_id in core by default.
So, it seems that extensions may conflict with the same field. Are we sure
that will happen if they are loaded in different orders in neighbouring
backends?Depends on what kind of computation once wants to do, and surely
without some coordination across the extensions you are using these
cannot be shared, but it's no different from the concept of a query
ID.
correct. This was also recently raised in the "making Explain extensible" [0]https://wiki.postgresql.org/wiki/Plan_ID_Jumbling
thread also. That is the nature of extensions, and coordination is required.
[0]: https://wiki.postgresql.org/wiki/Plan_ID_Jumbling
[1]: /messages/by-id/CA+TgmoZ8sTodL-orXQm51_npNxuDAS86BS5kC8t0LULneShRbg@mail.gmail.com
--
Sami Imseih
Amazon Web Services (AWS)
On 3/23/25 01:01, Michael Paquier wrote:
On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
planId actually looks less controversial than queryId or plan_node_id. At
the same time, it is not free from the different logic that extensions may
incorporate into this value: I can imagine, for example, the attempt of
uniquely numbering plans related to the same queryId, plain hashing of the
plan tree, hashing without constants, etc.One plan ID should have one single query ID,
I'm not sure I understand what do you mean by that. QueryId reflects
syntactic structure of the query, but planId - semantics. For example:
SELECT relname FROM pg_class JOIN pg_am USING (oid);
SELECT relname FROM pg_am JOIN pg_class USING (oid);
have different queryIds: -6493293269785547447 and 4243743071053231833
but the same plan:
Hash Join
Hash Cond: (pg_class.oid = pg_am.oid)
-> Seq Scan on pg_class
-> Hash
-> Seq Scan on pg_am
You can invent multiple other cases here - remember query tree
transformations we made before optimisation.
but the opposite may be
not necessarily true: a query ID could be associated with multiple
plan patterns (aka multiple plan IDs). What this offers is that we
know about which plan one query is currently using in live, for a
given query ID.
Okay, as I've said before, it seems practical. I just worry because I
see that people don't understand that queryId is a more or less
arbitrary value that may or may not group queries that differ only by
constants to a single "Query Class". I think the same attitude will be
paid to planId. It is okay for statistics. However, non-statistical
extensions need more guarantees and want to put different logic into
these values.
For now, we have examples of only statistical extensions in open-source
and may live with a proposed solution.
So, it seems that extensions may conflict with the same field. Are we sure
that will happen if they are loaded in different orders in neighbouring
backends?Depends on what kind of computation once wants to do, and surely
without some coordination across the extensions you are using these
cannot be shared, but it's no different from the concept of a query
ID.
Hmm, queryId generation code lies in the core and we already came to
terms that this field has only a statistical purpose. Do you want to
commit planId generation code?
--
regards, Andrei Lepikhov
On 3/23/25 04:22, Sami Imseih wrote:
On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
planId actually looks less controversial than queryId or plan_node_id. At
the same time, it is not free from the different logic that extensions may
incorporate into this value: I can imagine, for example, the attempt of
uniquely numbering plans related to the same queryId, plain hashing of the
plan tree, hashing without constants, etc.I think plan_node_id is probably the least controversial because that value
comes straight from core, and different extensions cannot have their own
interpretation of what that value could be.
I meant the proposal to let extensions calculate this field. Someone may
want node_id to be unique inside the plan; for someone - it should
reflect the nature of the node, and it may be the same even inside one
plan, etc. In this case, it is highly controversial.
Computing a plan_id is even more open for interpretation than it is
for query_id perhaps, which is why giving this ability to extensions
will be useful. Different extensions may choose to compute a plan_id
different ways, but they all should be able to push this value into core,
and this is what this patch will allow for.
That's why I worry about allowing extensions to compute it. Remember:
for now, an extension can't be sure that a conflicting one is not
already loaded in the backend. The code [1]/messages/by-id/441661.1742683797@sss.pgh.pa.us may relieve this problem.
[1]: /messages/by-id/441661.1742683797@sss.pgh.pa.us
/messages/by-id/441661.1742683797@sss.pgh.pa.us
--
regards, Andrei Lepikhov