Expose custom planning data in EXPLAIN
Hi,
Background and motivation
-------------------------
This feature is inspired by the pg_overexplain, new EXPLAIN-related
hooks and recent change of the Memoize node representation (4bc62b86849).
Based on user reports from the production system, we typically receive
no more than an 'explain verbose analyse' output to identify the issue's
origin. It is obvious to me that the detailisation of the EXPLAIN format
will never be enough. Even more, each new parameter, adding more
information, also complicates life for some people, flooding the screen
with unnecessary (for them) data.
With extensible EXPLAIN options and per-node/summary hooks, we can add
as many details as needed using modules. However, there is one
limitation: we can only explore Plan and PlanState data. If something
isn't transferred from the planning stage to the plan, we won't have a
way to expose this data.
For example, it is sometimes unclear why the optimiser chose
IncrementalSort or [did not choose] HashJoin, as we don't know the
ngroups estimation used at this specific point of the plan.
Design Overview
---------------
It appears that the only two changes required to enable the feature are
a hook and a field in the Plan node. In this patch, I have chosen to add
the hook to the copy_generic_path_info routine to limit its usage for
tracking purposes only. Also, I extended its interface with the
PlannerInfo pointer, which may be helpful in many cases. The new extlist
field in the Plan structure should contain (by convention) extensible
nodes only to let modules correctly pick their data. Also, it simplifies
the logic of the node serialisation.
An additional motivation for choosing Extensible Node is its lack of
core usage, which makes it seem unpolished and requires copying a
significant amount of code to use. This patch highlights this imperfection.
Tests
-----
To demonstrate its real-life application, I added an example to
pg_overexplain. Here, a ngroups value is computed, stored in the Plan
node, and exposed in explain. Also, it is a test for the ExtensionNode
machinery.
Downsides
-----------
1. Growth of the plan node
2. Read/write extensible node - what if reading worker (or backend ?)
doesn't have the module installed?
3. The point for the hook call.
The first issue is quite limited because the only version of the plan
exists, in contrast to the multiple paths.
The second issue is a little more complicated. However, I believe the
issue could be resolved by allowing extensions to determine the logic
for serialising their ExtensibleNode.
The selection of the point for the hook appears to be quite strict. It
does not permit any extensions to alter the final plan or disrupt its
consistency, except for some cost data. However, it does allow for
tracking the decisions made during the planning phase.
See the patch attached.
--
regards, Andrei Lepikhov
Attachments:
v0-0001-Introduce-a-create-plan-hook.patchtext/plain; charset=UTF-8; name=v0-0001-Introduce-a-create-plan-hook.patchDownload
From 02599b89bf3514be3b3ede69e4203379e4e2d1fb Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Wed, 13 Aug 2025 12:24:20 +0200
Subject: [PATCH v0] Introduce a create plan hook.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The optimisation process involves multiple parameters and estimations.
Sometimes it is not apparent why one strategy is chosen over another.
Postgres continuously extends the number of parameters, as reflected in the
EXPLAIN output. The last time this was extended was when the Memoize node
representation was updated by commit 4bc62b86849.
Sometimes a parameter has a default value for EXPLAIN, but its usage may be too
narrow. Extensibility, added to Postgres 18, allows changing the summary part
as well as each node representation. A slight adjustment is required to enable
extensions to transfer information from the planning stage and display it in an
explanation covered by an option.
With this patch, a hook is added that lets extensions do something during the
transformation of a path node to the final plan node. The Plan structure is
extended by an extlist pointer, which is designated to store the extension's
data in the form of an ExtensibleNode. The example on exposing predicted ngroups
in an IncrementalSort node is added to the pg_overexplain extension.
---
.../expected/pg_overexplain.out | 20 ++
contrib/pg_overexplain/pg_overexplain.c | 211 ++++++++++++++++++
contrib/pg_overexplain/sql/pg_overexplain.sql | 13 ++
src/backend/optimizer/plan/createplan.c | 95 ++++----
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/planmain.h | 5 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 306 insertions(+), 45 deletions(-)
diff --git a/contrib/pg_overexplain/expected/pg_overexplain.out b/contrib/pg_overexplain/expected/pg_overexplain.out
index 6de02323d7c..be2ed2aad52 100644
--- a/contrib/pg_overexplain/expected/pg_overexplain.out
+++ b/contrib/pg_overexplain/expected/pg_overexplain.out
@@ -487,3 +487,23 @@ INSERT INTO vegetables (name, genus) VALUES ('broccoflower', 'brassica');
Result RTIs: 1
(14 rows)
+-- A test case for the number of groups used in cost estimation of
+-- an incremental sort node
+CREATE TABLE incremental_groups (x integer, y integer);
+INSERT INTO incremental_groups (x,y)
+ SELECT gs,gs FROM generate_series(1,1000) AS gs;
+VACUUM ANALYZE incremental_groups;
+CREATE INDEX ON incremental_groups (x);
+EXPLAIN (COSTS OFF, PLAN_DETAILS)
+SELECT * FROM incremental_groups ORDER BY x, y LIMIT 10;
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: x, y
+ Presorted Key: x
+ Estimated Groups: 1000
+ -> Index Scan using incremental_groups_x_idx on incremental_groups
+(6 rows)
+
+DROP TABLE incremental_groups;
diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index de824566f8c..8010474fdd0 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -16,10 +16,15 @@
#include "commands/explain_format.h"
#include "commands/explain_state.h"
#include "fmgr.h"
+#include "nodes/extensible.h"
+#include "nodes/readfuncs.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/planmain.h"
#include "parser/parsetree.h"
#include "storage/lock.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
PG_MODULE_MAGIC_EXT(
.name = "pg_overexplain",
@@ -30,6 +35,7 @@ typedef struct
{
bool debug;
bool range_table;
+ bool plan_details;
} overexplain_options;
static overexplain_options *overexplain_ensure_options(ExplainState *es);
@@ -37,6 +43,10 @@ static void overexplain_debug_handler(ExplainState *es, DefElem *opt,
ParseState *pstate);
static void overexplain_range_table_handler(ExplainState *es, DefElem *opt,
ParseState *pstate);
+static void overexplain_plan_details_handler(ExplainState *es, DefElem *opt,
+ ParseState *pstate);
+static void overexplain_copy_path_info_hook(PlannerInfo *root,
+ Plan *dest, Path *src);
static void overexplain_per_node_hook(PlanState *planstate, List *ancestors,
const char *relationship,
const char *plan_name,
@@ -50,6 +60,7 @@ static void overexplain_per_plan_hook(PlannedStmt *plannedstmt,
static void overexplain_debug(PlannedStmt *plannedstmt, ExplainState *es);
static void overexplain_range_table(PlannedStmt *plannedstmt,
ExplainState *es);
+static void overexplain_node_plan_details(Plan *plan, ExplainState *es);
static void overexplain_alias(const char *qlabel, Alias *alias,
ExplainState *es);
static void overexplain_bitmapset(const char *qlabel, Bitmapset *bms,
@@ -58,9 +69,90 @@ static void overexplain_intlist(const char *qlabel, List *list,
ExplainState *es);
static int es_extension_id;
+static copy_path_info_hook_type prev_copy_path_info_hook;
static explain_per_node_hook_type prev_explain_per_node_hook;
static explain_per_plan_hook_type prev_explain_per_plan_hook;
+/*
+ * The extension's ExtensibleNode stuff to store planner data into the Plan node
+ * XXX: there are a lot of stuff that must be copied each time we want to
+ * introduce an extensible node. Maybe just export all that macroses?
+ */
+typedef struct OverExplainNode
+{
+ ExtensibleNode node;
+
+ double input_groups;
+} OverExplainNode;
+
+
+#define strtobool(x) ((*(x) == 't') ? true : false)
+
+#define nullable_string(token,length) \
+ ((length) == 0 ? NULL : debackslash(token, length))
+
+#define booltostr(x) ((x) ? "true" : "false")
+
+
+static void
+OEnodeCopy(struct ExtensibleNode *enew, const struct ExtensibleNode *eold)
+{
+ OverExplainNode *new = (OverExplainNode *) enew;
+ OverExplainNode *old = (OverExplainNode *) eold;
+
+ new->input_groups = old->input_groups;
+
+ enew = (ExtensibleNode *) new;
+}
+
+static bool
+OEnodeEqual(const struct ExtensibleNode *a, const struct ExtensibleNode *b)
+{
+ OverExplainNode *a1 = (OverExplainNode *) a;
+ OverExplainNode *b1 = (OverExplainNode *) b;
+
+ return (a1->input_groups == b1->input_groups);
+}
+
+
+/* Write a float field --- caller must give format to define precision */
+#define WRITE_FLOAT_FIELD(fldname,format) \
+ appendStringInfo(str, " :" CppAsString(fldname) " " format, node->fldname)
+
+static void
+OEnodeOut(struct StringInfoData *str, const struct ExtensibleNode *enode)
+{
+ OverExplainNode *node = (OverExplainNode *) enode;
+
+ WRITE_FLOAT_FIELD(input_groups, "%.0f");
+}
+
+/* Read a float field */
+#define READ_FLOAT_FIELD(fldname) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ token = pg_strtok(&length); /* get field value */ \
+ node->fldname = atof(token)
+
+static void
+OEnodeRead(struct ExtensibleNode *enode)
+{
+ OverExplainNode *node = (OverExplainNode *) enode;
+ const char *token;
+ int length;
+
+ READ_FLOAT_FIELD(input_groups);
+}
+
+static const ExtensibleNodeMethods method =
+{
+ .extnodename = "pg_overexplain",
+ .node_size = sizeof(OverExplainNode),
+ .nodeCopy = OEnodeCopy,
+ .nodeEqual = OEnodeEqual,
+ .nodeOut = OEnodeOut,
+ .nodeRead = OEnodeRead
+};
+
/*
* Initialization we do when this module is loaded.
*/
@@ -74,12 +166,19 @@ _PG_init(void)
RegisterExtensionExplainOption("debug", overexplain_debug_handler);
RegisterExtensionExplainOption("range_table",
overexplain_range_table_handler);
+ RegisterExtensionExplainOption("plan_details",
+ overexplain_plan_details_handler);
+
+ prev_copy_path_info_hook = copy_path_info_hook;
+ copy_path_info_hook = overexplain_copy_path_info_hook;
/* Use the per-node and per-plan hooks to make our options do something. */
prev_explain_per_node_hook = explain_per_node_hook;
explain_per_node_hook = overexplain_per_node_hook;
prev_explain_per_plan_hook = explain_per_plan_hook;
explain_per_plan_hook = overexplain_per_plan_hook;
+
+ RegisterExtensibleNodeMethods(&method);
}
/*
@@ -125,6 +224,18 @@ overexplain_range_table_handler(ExplainState *es, DefElem *opt,
options->range_table = defGetBoolean(opt);
}
+/*
+ * Parse handler for EXPLAIN (PLAN_DETAILS).
+ */
+static void
+overexplain_plan_details_handler(ExplainState *es, DefElem *opt,
+ ParseState *pstate)
+{
+ overexplain_options *options = overexplain_ensure_options(es);
+
+ options->plan_details = defGetBoolean(opt);
+}
+
/*
* Print out additional per-node information as appropriate. If the user didn't
* specify any of the options we support, do nothing; else, print whatever is
@@ -240,6 +351,13 @@ overexplain_per_node_hook(PlanState *planstate, List *ancestors,
break;
}
}
+
+ /*
+ * If the "plan_details" option was specified, display information about
+ * the node planning decisions.
+ */
+ if (options->plan_details)
+ overexplain_node_plan_details(plan, es);
}
/*
@@ -671,6 +789,99 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
ExplainCloseGroup("Range Table", "Range Table", false, es);
}
+/*
+ * Provide detailed information about planning decision has been made by the
+ * optimiser
+ */
+static void
+overexplain_node_plan_details(Plan *plan, ExplainState *es)
+{
+ if (!IsA(plan, IncrementalSort) || plan->extlist == NIL)
+ return;
+
+ /*
+ * Pass through the extension list. By convention, each element must be
+ * an extensible node that enables the owner's identification.
+ */
+ foreach_node(ExtensibleNode, enode, plan->extlist)
+ {
+ OverExplainNode *data;
+
+ if (strcmp(enode->extnodename, "pg_overexplain") != 0)
+ continue;
+
+ /* Add the information to the explain */
+ data = (OverExplainNode *) enode;
+ ExplainPropertyFloat("Estimated Groups", NULL, data->input_groups, 0, es);
+ }
+
+ return;
+}
+
+/*
+ * Gather/calculate necessary optimisation information about the path and
+ * store it into the Plan node.
+ *
+ * At the moment here is an adopted copy of the optimiser code that allows
+ * the extension to calculate real numbers used during optimisation phase.
+ */
+static void
+overexplain_copy_path_info_hook(PlannerInfo *root, Plan *dest, Path *src)
+{
+ IncrementalSortPath *sort_path;
+ double input_tuples;
+ double input_groups;
+ ListCell *lc;
+ List *presortedExprs = NIL;
+ bool unknown_varno = false;
+ OverExplainNode *data = (OverExplainNode *) newNode(sizeof(OverExplainNode),
+ T_ExtensibleNode);
+
+ if (!IsA(src, IncrementalSortPath))
+ return;
+
+ sort_path = (IncrementalSortPath *) src;
+ Assert(sort_path->spath.subpath->pathkeys != NIL);
+
+ input_tuples = sort_path->spath.subpath->rows;
+
+ if (input_tuples < 2.0)
+ input_tuples = 2.0;
+ input_groups = Min(input_tuples, DEFAULT_NUM_DISTINCT);
+
+ foreach(lc, sort_path->spath.subpath->pathkeys)
+ {
+ PathKey *key = (PathKey *) lfirst(lc);
+ EquivalenceMember *member = (EquivalenceMember *)
+ linitial(key->pk_eclass->ec_members);
+
+ /*
+ * Check if the expression contains Var with "varno 0" so that we
+ * don't call estimate_num_groups in that case.
+ */
+ if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+ {
+ unknown_varno = true;
+ break;
+ }
+
+ /* expression not containing any Vars with "varno 0" */
+ presortedExprs = lappend(presortedExprs, member->em_expr);
+
+ if (foreach_current_index(lc) + 1 >= sort_path->nPresortedCols)
+ break;
+ }
+
+ /* Estimate the number of groups with equal presorted keys. */
+ if (!unknown_varno)
+ input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+ NULL, NULL);
+
+ data->node.extnodename = "pg_overexplain";
+ data->input_groups = input_groups;
+ dest->extlist = lappend(dest->extlist, data);
+}
+
/*
* Emit a text property describing the contents of an Alias.
*
diff --git a/contrib/pg_overexplain/sql/pg_overexplain.sql b/contrib/pg_overexplain/sql/pg_overexplain.sql
index 42e275ac2f9..ec7e344f42a 100644
--- a/contrib/pg_overexplain/sql/pg_overexplain.sql
+++ b/contrib/pg_overexplain/sql/pg_overexplain.sql
@@ -110,3 +110,16 @@ SELECT * FROM vegetables WHERE genus = 'daucus';
-- Also test a case that involves a write.
EXPLAIN (RANGE_TABLE, COSTS OFF)
INSERT INTO vegetables (name, genus) VALUES ('broccoflower', 'brassica');
+
+-- A test case for the number of groups used in cost estimation of
+-- an incremental sort node
+CREATE TABLE incremental_groups (x integer, y integer);
+INSERT INTO incremental_groups (x,y)
+ SELECT gs,gs FROM generate_series(1,1000) AS gs;
+VACUUM ANALYZE incremental_groups;
+CREATE INDEX ON incremental_groups (x);
+
+EXPLAIN (COSTS OFF, PLAN_DETAILS)
+SELECT * FROM incremental_groups ORDER BY x, y LIMIT 10;
+
+DROP TABLE incremental_groups;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9fd5c31edf2..c2f7b1ecf31 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -72,6 +72,7 @@
#define CP_LABEL_TLIST 0x0004 /* tlist must contain sortgrouprefs */
#define CP_IGNORE_TLIST 0x0008 /* caller will replace tlist */
+copy_path_info_hook_type copy_path_info_hook = NULL;
static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
int flags);
@@ -175,7 +176,7 @@ static Node *fix_indexqual_clause(PlannerInfo *root,
static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
static List *get_switched_clauses(List *clauses, Relids outerrelids);
static List *order_qual_clauses(PlannerInfo *root, List *clauses);
-static void copy_generic_path_info(Plan *dest, Path *src);
+static void copy_generic_path_info(PlannerInfo *root, Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static void label_sort_with_costsize(PlannerInfo *root, Sort *plan,
double limit_tuples);
@@ -1253,7 +1254,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
false)),
NULL);
- copy_generic_path_info(plan, (Path *) best_path);
+ copy_generic_path_info(root, plan, (Path *) best_path);
return plan;
}
@@ -1437,7 +1438,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -1481,7 +1482,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* prepare_sort_from_pathkeys on it before we do so on the individual
* child plans, to make cross-checking the sort info easier.
*/
- copy_generic_path_info(plan, (Path *) best_path);
+ copy_generic_path_info(root, plan, (Path *) best_path);
plan->targetlist = tlist;
plan->qual = NIL;
plan->lefttree = NULL;
@@ -1651,7 +1652,7 @@ create_group_result_plan(PlannerInfo *root, GroupResultPath *best_path)
plan = make_result(tlist, (Node *) quals, NULL);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -1676,7 +1677,7 @@ create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path)
plan = make_project_set(tlist, subplan);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -1704,7 +1705,7 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
plan = make_material(subplan);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -1759,7 +1760,7 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
best_path->est_entries, keyparamids, best_path->est_calls,
best_path->est_unique_keys, best_path->est_hit_ratio);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -1960,7 +1961,7 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
}
/* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
+ copy_generic_path_info(root, plan, &best_path->path);
return plan;
}
@@ -1995,7 +1996,7 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path)
best_path->single_copy,
subplan);
- copy_generic_path_info(&gather_plan->plan, &best_path->path);
+ copy_generic_path_info(root, &gather_plan->plan, &best_path->path);
/* use parallel mode for parallel plans. */
root->glob->parallelModeNeeded = true;
@@ -2024,7 +2025,7 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
gm_plan = makeNode(GatherMerge);
gm_plan->plan.targetlist = tlist;
gm_plan->num_workers = best_path->num_workers;
- copy_generic_path_info(&gm_plan->plan, &best_path->path);
+ copy_generic_path_info(root, &gm_plan->plan, &best_path->path);
/* Assign the rescan Param. */
gm_plan->rescan_param = assign_special_exec_param(root);
@@ -2150,7 +2151,7 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
/* We need a Result node */
plan = (Plan *) make_result(tlist, NULL, subplan);
- copy_generic_path_info(plan, (Path *) best_path);
+ copy_generic_path_info(root, plan, (Path *) best_path);
}
return plan;
@@ -2251,7 +2252,7 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
IS_OTHER_REL(best_path->subpath->parent) ?
best_path->path.parent->relids : NULL);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2277,7 +2278,7 @@ create_incrementalsort_plan(PlannerInfo *root, IncrementalSortPath *best_path,
best_path->spath.path.parent->relids : NULL,
best_path->nPresortedCols);
- copy_generic_path_info(&plan->sort.plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->sort.plan, (Path *) best_path);
return plan;
}
@@ -2316,7 +2317,7 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
subplan->targetlist),
subplan);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2344,7 +2345,7 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
best_path->path.pathkeys,
best_path->numkeys);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2388,7 +2389,7 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
best_path->transitionSpace,
subplan);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2585,7 +2586,7 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
subplan);
/* Copy cost data from Path to Plan */
- copy_generic_path_info(&plan->plan, &best_path->path);
+ copy_generic_path_info(root, &plan->plan, &best_path->path);
}
return (Plan *) plan;
@@ -2644,7 +2645,7 @@ create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path)
plan = make_result(tlist, (Node *) best_path->quals, NULL);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
/*
* During setrefs.c, we'll need to replace references to the Agg nodes
@@ -2748,7 +2749,7 @@ create_windowagg_plan(PlannerInfo *root, WindowAggPath *best_path)
best_path->topwindow,
subplan);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2788,7 +2789,7 @@ create_setop_plan(PlannerInfo *root, SetOpPath *best_path, int flags)
best_path->groupList,
numGroups);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2824,7 +2825,7 @@ create_recursiveunion_plan(PlannerInfo *root, RecursiveUnionPath *best_path)
best_path->distinctList,
numGroups);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2847,7 +2848,7 @@ create_lockrows_plan(PlannerInfo *root, LockRowsPath *best_path,
plan = make_lockrows(subplan, best_path->rowMarks, best_path->epqParam);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2888,7 +2889,7 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
best_path->mergeJoinConditions,
best_path->epqParam);
- copy_generic_path_info(&plan->plan, &best_path->path);
+ copy_generic_path_info(root, &plan->plan, &best_path->path);
return plan;
}
@@ -2942,7 +2943,7 @@ create_limit_plan(PlannerInfo *root, LimitPath *best_path, int flags)
best_path->limitOption,
numUniqkeys, uniqColIdx, uniqOperators, uniqCollations);
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(root, &plan->plan, (Path *) best_path);
return plan;
}
@@ -2988,7 +2989,7 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
scan_clauses,
scan_relid);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -3034,7 +3035,7 @@ create_samplescan_plan(PlannerInfo *root, Path *best_path,
scan_relid,
tsc);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -3235,7 +3236,7 @@ create_indexscan_plan(PlannerInfo *root,
indexorderbyops,
best_path->indexscandir);
- copy_generic_path_info(&scan_plan->plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->plan, &best_path->path);
return scan_plan;
}
@@ -3350,7 +3351,7 @@ create_bitmap_scan_plan(PlannerInfo *root,
bitmapqualorig,
baserelid);
- copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, &best_path->path);
return scan_plan;
}
@@ -3670,7 +3671,7 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
scan_relid,
tidquals);
- copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, &best_path->path);
return scan_plan;
}
@@ -3735,7 +3736,7 @@ create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
scan_relid,
tidrangequals);
- copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, &best_path->path);
return scan_plan;
}
@@ -3794,7 +3795,7 @@ create_subqueryscan_plan(PlannerInfo *root, SubqueryScanPath *best_path,
scan_relid,
subplan);
- copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, &best_path->path);
return scan_plan;
}
@@ -3837,7 +3838,7 @@ create_functionscan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_functionscan(tlist, scan_clauses, scan_relid,
functions, rte->funcordinality);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -3880,7 +3881,7 @@ create_tablefuncscan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_tablefuncscan(tlist, scan_clauses, scan_relid,
tablefunc);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -3924,7 +3925,7 @@ create_valuesscan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_valuesscan(tlist, scan_clauses, scan_relid,
values_lists);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -4018,7 +4019,7 @@ create_ctescan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_ctescan(tlist, scan_clauses, scan_relid,
plan_id, cte_param_id);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -4057,7 +4058,7 @@ create_namedtuplestorescan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_namedtuplestorescan(tlist, scan_clauses, scan_relid,
rte->enrname);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -4095,7 +4096,7 @@ create_resultscan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_result(tlist, (Node *) scan_clauses, NULL);
- copy_generic_path_info(&scan_plan->plan, best_path);
+ copy_generic_path_info(root, &scan_plan->plan, best_path);
return scan_plan;
}
@@ -4155,7 +4156,7 @@ create_worktablescan_plan(PlannerInfo *root, Path *best_path,
scan_plan = make_worktablescan(tlist, scan_clauses, scan_relid,
cteroot->wt_param_id);
- copy_generic_path_info(&scan_plan->scan.plan, best_path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, best_path);
return scan_plan;
}
@@ -4215,7 +4216,7 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
outer_plan);
/* Copy cost data from Path to Plan; no need to make FDW do this */
- copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &scan_plan->scan.plan, &best_path->path);
/* Copy user OID to access as; likewise no need to make FDW do this */
scan_plan->checkAsUser = rel->userid;
@@ -4360,7 +4361,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
* Copy cost data from Path to Plan; no need to make custom-plan providers
* do this
*/
- copy_generic_path_info(&cplan->scan.plan, &best_path->path);
+ copy_generic_path_info(root, &cplan->scan.plan, &best_path->path);
/* Likewise, copy the relids that are represented by this custom scan */
cplan->custom_relids = best_path->path.parent->relids;
@@ -4538,7 +4539,7 @@ create_nestloop_plan(PlannerInfo *root,
best_path->jpath.jointype,
best_path->jpath.inner_unique);
- copy_generic_path_info(&join_plan->join.plan, &best_path->jpath.path);
+ copy_generic_path_info(root, &join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -4892,7 +4893,7 @@ create_mergejoin_plan(PlannerInfo *root,
best_path->skip_mark_restore);
/* Costs of sort and material steps are included in path cost already */
- copy_generic_path_info(&join_plan->join.plan, &best_path->jpath.path);
+ copy_generic_path_info(root, &join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -5065,7 +5066,7 @@ create_hashjoin_plan(PlannerInfo *root,
best_path->jpath.jointype,
best_path->jpath.inner_unique);
- copy_generic_path_info(&join_plan->join.plan, &best_path->jpath.path);
+ copy_generic_path_info(root, &join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -5559,7 +5560,7 @@ order_qual_clauses(PlannerInfo *root, List *clauses)
* Also copy the parallel-related flags, which the executor *will* use.
*/
static void
-copy_generic_path_info(Plan *dest, Path *src)
+copy_generic_path_info(PlannerInfo *root, Plan *dest, Path *src)
{
dest->disabled_nodes = src->disabled_nodes;
dest->startup_cost = src->startup_cost;
@@ -5568,6 +5569,10 @@ copy_generic_path_info(Plan *dest, Path *src)
dest->plan_width = src->pathtarget->width;
dest->parallel_aware = src->parallel_aware;
dest->parallel_safe = src->parallel_safe;
+
+ /* Let an extension to do an additional job before finalizing the plan node */
+ if (copy_path_info_hook)
+ (copy_path_info_hook)(root, dest, src);
}
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 29d7732d6a0..843a4ced763 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -239,6 +239,12 @@ typedef struct Plan
*/
Bitmapset *extParam;
Bitmapset *allParam;
+
+ /*
+ * Is intended to store all additional information needed to an extension
+ * during or after execution phase.
+ */
+ List *extlist;
} Plan;
/* ----------------
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..24143fc5514 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -25,6 +25,11 @@ extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
+/* Hook for plugins to do something converting final path to the plan node */
+typedef void (*copy_path_info_hook_type) (PlannerInfo *root,
+ Plan *dest, Path *src);
+extern PGDLLIMPORT copy_path_info_hook_type copy_path_info_hook;
+
/*
* prototypes for plan/planmain.c
*/
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..31da2aaaac0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3525,6 +3525,7 @@ contain_placeholder_references_context
convert_testexpr_context
copy_data_dest_cb
copy_data_source_cb
+copy_path_info_hook_type
core_YYSTYPE
core_yy_extra_type
core_yyscan_t
--
2.50.1
On Wed, Aug 13, 2025 at 9:51 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
It appears that the only two changes required to enable the feature are
a hook and a field in the Plan node. In this patch, I have chosen to add
the hook to the copy_generic_path_info routine to limit its usage for
tracking purposes only. Also, I extended its interface with the
PlannerInfo pointer, which may be helpful in many cases. The new extlist
field in the Plan structure should contain (by convention) extensible
nodes only to let modules correctly pick their data. Also, it simplifies
the logic of the node serialisation.An additional motivation for choosing Extensible Node is its lack of
core usage, which makes it seem unpolished and requires copying a
significant amount of code to use. This patch highlights this imperfection.
This seems quite closely related to what I propose here:
/messages/by-id/CA+TgmoYxfg90rw13+JcYwn4dwSC+agw7o8-A+fA3M0fh96pg8w@mail.gmail.com
There are some differences. In my proposal, specifically in v3-0004, I
just add a single member to the PlannedStmt, and assume that the code
can find a way to jam all the state it cares about into that single
field, for example by creating a list of plan_node_id values and a
list of associated nodes that can carry the corresponding data.
Likewise, I just put a single hook in there, in v3-0003, to allow data
to be propagated from the plan-time data structures to that new
PlannedStmt member. In your proposal, by contrast, there's a place to
put extension-relevant information in every single Plan node, and a
hook call for every single plan node as well.
I think both approaches have some advantages. One advantage of my
proposal is that it's cheaper. Your proposal makes every Plan node 8
bytes larger even though most of the time that extra pointer will be
NULL. I have been yelled at in the past for proposing to increase the
size of Plan, so I'm a little reluctant to believe that it's OK to do
that here. It might be less relevant now, as I think before we might
have been just on the cusp of needing one more cache line for every
Plan node, and it doesn't look like that's true currently, so maybe it
wouldn't provoke as much objection, but I'm still nervous about the
idea. A related disadvantage of your approach is that it needs to
consider calling the hook function lots of times instead of just one
time, though perhaps that's too insignificant to bother about. Also,
with my approach is that it's possible to propagate information from
PlannerInfo or PlannerGlobal structs, not just individual Plan nodes.
On the other hand, your proposal has usability advantages. If what
you're trying to do is save some details for every Plan node, my
approach requires you to run around and walk the plan tree and
marshall the data that you want to save, whereas your approach allows
you to do things in a more straightforward way. I think this actually
points to a deeper flaw in my approach: sure, you can run around and
look at the best path and the final plan and save whatever you want,
but how do you connect a path node to the corresponding plan node? The
Plan objects have a plan_node_id value, but the path objects don't
yet, and it's not real obvious how to match things up. Your approach
solves this problem by putting a callback in a place where it gets
passed the Path and the corresponding Plan at the same time. That's
extremely convenient.
Another thing that is different is that my patch series is clearer
about how multiple unrelated planner extensions are intended to
coexist. That's not a fundamental advantage of my approach, because
the same idea could be integrated into what you've done; it's only a
difference in how things stand as currently proposed.
My overall feeling is that we should try to come up with a unified
approach here. I'm not sure exactly what it should look like, though.
I think the strongest part of your proposal is the fact that it
connects each Path node to the corresponding Plan node in a very clear
way, and I think that the weakest part of your proposal is that it
makes each Plan node larger. I would be curious to hear what others
think.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 9/9/2025 17:32, Robert Haas wrote:
On Wed, Aug 13, 2025 at 9:51 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
This seems quite closely related to what I propose here:/messages/by-id/CA+TgmoYxfg90rw13+JcYwn4dwSC+agw7o8-A+fA3M0fh96pg8w@mail.gmail.com
I'd say it is another viewpoint (mostly opposite) of the feature.>
There are some differences. In my proposal, specifically in v3-0004, I
just add a single member to the PlannedStmt, and assume that the code
can find a way to jam all the state it cares about into that single
field, for example by creating a list of plan_node_id values and a
list of associated nodes that can carry the corresponding data.
It may work if it is underpinned with a proper hook at the place where
we already have the final path tree and can pointer to the plan node.>
Likewise, I just put a single hook in there, in v3-0003, to allow data
to be propagated from the plan-time data structures to that new
PlannedStmt member. In your proposal, by contrast, there's a place to
put extension-relevant information in every single Plan node, and a
hook call for every single plan node as well.
Yes, we don't know which path tree is the final one. That's more, a plan
tree may contain nodes that have never been in the path tree (Hash and
Sort nodes as an example). My approach was caused by years of struggle
to match path -> plan and plan -> 'no path' cases. The create_plan hook
provides a straightforward way and guarantees.>
I think both approaches have some advantages. One advantage of my
proposal is that it's cheaper. Your proposal makes every Plan node 8
bytes larger even though most of the time that extra pointer will be
NULL. I have been yelled at in the past for proposing to increase the
size of Plan, so I'm a little reluctant to believe that it's OK to do
that here.
That's why I have not exposed this approach before. But now I see how
specific plan nodes grow just to let explain be more detailed (remember
the recent Memoise node expose of distinct predictions). The same stuff
may be usual for HashJoin and IncrementalSort. IndexScan still hides a
lot of the optimiser decisions that are sometimes needed to reveal
performance issues. MergeJoin doesn't always show hidden
optimisations... So, I think it would be better to extend the Plan node
once and let in-core and external modules put their data inside that
list on demand.> It might be less relevant now, as I think before we might
have been just on the cusp of needing one more cache line for every
Plan node, and it doesn't look like that's true currently, so maybe it
wouldn't provoke as much objection, but I'm still nervous about the
idea.
I think it needs an alternative glance.
A related disadvantage of your approach is that it needs to
consider calling the hook function lots of times instead of just one
time, though perhaps that's too insignificant to bother about. Also,
The final plan does not contain so many nodes to care about.
Additionally, plan tracking extensions will require adding data to most
plan nodes. So, it would allow us to design more effective extensions,
having data right in place (the Plan node) and no need to pass through a
hash table or node tree.> with my approach is that it's possible to
propagate information from
PlannerInfo or PlannerGlobal structs, not just individual Plan nodes.
PlannerInfo and PlannerGlobal are the right nodes to be extended. I also
constantly patch the RelOptInfo node because it is a highly stable node
during the planning phase and represents a kinda of relational operation.
In my extensions/patches, PlannerGlobal data is usually needed for the
initialisation of the EState, and PlannerInfo data is used to set up
PlannedStmt and Subplan nodes properly.
yet, and it's not real obvious how to match things up. Your approach
solves this problem by putting a callback in a place where it gets
passed the Path and the corresponding Plan at the same time. That's
extremely convenient.
Yes, when we need to track plan decisions and observe their impact on
query (node) performance, stability of this match becomes a critical part.>
Another thing that is different is that my patch series is clearer
about how multiple unrelated planner extensions are intended to
coexist. That's not a fundamental advantage of my approach, because
the same idea could be integrated into what you've done; it's only a
difference in how things stand as currently proposed.
We have the same issues with hooks. But up to now, extensions have lived
together, sharing hooks. Having a convention with DefElem or an
Extensible node, we may forget about that issue. More importantly, we
need to survive read/write/copy operations, at least to live inside a
parallel worker, plan cache, generic plan, and pass READ/WRITE plan tree
tests, enabled on-demand during compilation.>
My overall feeling is that we should try to come up with a unified
approach here. I'm not sure exactly what it should look like, though.
I think the strongest part of your proposal is the fact that it
connects each Path node to the corresponding Plan node in a very clear
way, and I think that the weakest part of your proposal is that it
makes each Plan node larger. I would be curious to hear what others
think.
Agree with the node grow issue.
Positive arguments from the top of my mind:
1. It provides a clear and unified approach, allowing to extend any node
in the same way according to the same convention.
2. Extensions may collaborate through these fields. It is suitable for
developers who implement business logic close to the Postgres core as an
extension (remember Yuri Rashkovskii's initiative).
3. It is clearer how to maintain read/write/copy object.
Generally, it seems we represent opposite design approaches. Business
app developers include a lot of logic in their code and want to reduce
overhead as much as possible. So, they value flexibility, like hooks
provide. In their mind, DBMS is another glibc ;). You look from the core
safety point of view and want to protect everything possible.
Not sure if it is needed here.
Anyway, I will personally reduce the core patch size in my modules
drastically with your approach, too, but it will come at the cost of
increasing complexity.
-- regards, Andrei Lepikhov