Memory consumed by paths during partitionwise join planning

Started by Ashutosh Bapatover 2 years ago24 messages

ashutosh.bapat.oss@gmail.com

over 2 years ago

6 attachment(s)

Hi All,

If add_path() and add_partial_path() do not find a new path to be
superior to any existing paths, they free the new path. They free an
existing path if it is found to be inferior to the new path. But not
all paths surviving in a RelOptInfo are used to create paths for
relations which use it as input. Further add_path and add_partial_path
do not free the whole path tree but just the topmost pathnode. The
subpath nodes are not freed because they may be referenced by other
paths. The subpaths continue to occupy memory even if they are not
used anywhere. As we build the relation tree upwards (base, join,
upper relations) more and more such paths are accumulated and continue
to consume memory till the statement ends. With partitionwise join
involving partitioned tables with thousands of partitions this memory
consumption increases proportional to the number of partitions.

Attached WIP patches address this issue by adding infrastructure to
track references to paths and free unreferenced paths once they can be
deemed as useless. A new member Path::ref_count in a pathnode tracks
how many other objects, like pathlists in RelOptInfo and other paths,
reference that pathnode. As the path nodes are referenced they are
"linked" using link_path() to referencing objects. When the path nodes
are no longer referenced they are "unlinked" using unlink_path() from
the referencing objects. Path nodes are freed using free_path(). The
function unlinks the sub path nodes so that they can be freed when
their reference count drops to 0. The paths whose references reach 0
during unlinking are freed automatically using free_path(). Patch 0002
adds this infrastructure. 0003 and 0004 use these functions in example
cases.

With these patches the memory consumption numbers look like below.
Experiment
----------
Memory consumption is measured using a method similar to the one
described in [1]. The changes to EXPLAIN ANALYZE to report planning
memory are in 0001. Memory consumed when planning a self-join query is
measured. The queries involve partitioned and unpartitioned tables
respectively. The partitioned table has 1000 partitions in it. The
table definitions and helper function can be found in setup.sql and
the queries can be found in queries.sql. This is the simplest setup.
Higher savings may be seen with more complex setups involving indexes,
SQL operators and clauses.

Table 1: Join between unpartitioned tables
Number of tables | without patch | with patch | % reduction |
being joined | | | |
--------------------------------------------------------------
2 | 29.0 KiB | 28.9 KiB | 0.66% |
3 | 79.1 KiB | 76.7 KiB | 3.03% |
4 | 208.6 KiB | 198.2 KiB | 4.97% |
5 | 561.6 KiB | 526.3 KiB | 6.28% |

Table 2: join between partitioned tables with partitionwise join
enabled (enable_partitionwise_join = true).
Number of tables | without patch | with patch | % reduction |
being joined | | | |
----------------------------------------------------------------
2 | 40.3 MiB | 40.0 MiB | 0.70% |
3 | 146.9 MiB | 143.1 MiB | 2.55% |
4 | 445.4 MiB | 430.4 MiB | 3.38% |
5 | 1563.3 MiB | 1517.6 MiB | 2.92% |

The patch is not complete because of following issues:

a. 0003 and 0004 patches do not cover all the path nodes. I have
covered only those which I encountered in the queries I ran. If we
accept this proposal, I will work on covering all the path nodes.

b. It does not cover the entire RelOptInfo tree that the planner
creates. The paths in a lower level RelOptInfo can be deemed as
useless only when path creation for all the immediate upper
RelOptInfos is complete. Thus we need access to both upper and lower
level RelOptInfos at the same time. The RelOptInfos created while
planning join are available in standard_join_search(). Thus it's
possible to free unused paths listed in all the RelOptInfos except the
topmost RelOptInfo in this function as done in the patch. But the
topmost RelOptInfo acts as an input to upper RelOptInfo in
grouping_planner() where we don't have access to the RelOptInfos at
lower levels in the join tree. Hence we can't free the unused paths
from topmost RelOptInfo and cascade that effect down the RelOptInfo
tree. This is the reason why we don't see memory reduction in case of
2-way join. This is also the reason why the numbers above are lower
than the actual memory that can be saved. If we decide to move ahead
with this approach, I will work on this too.

c. The patch does not call link_path and unlink_path in all the
required places. We will need some work to identify such places, to
build infrastructure and methods to identify such places in future.

Another approach to fixing this would be to use separate memory
context for creating path nodes and then deleting the entire memory
context at the end of planning once the plan is created. We will need
to reset path lists as well as cheapest_path members in RelOptInfos as
well. This will possibly free up more memory and might be faster. But
I have not tried it. The approach taken in the patch has an advantage
over this one i.e. the paths can be freed at any stage in the planning
using free_unused_paths_from_rel() implemented in the patch. Thus we
can monitor the memory consumption and trigger garbage collection when
it crosses a certain threashold. Or we may implement both the
approaches to clean every bit of paths at the end of planning while
garbage collecting pathnodes when memory consumption goes beyond
threashold. The reference mechanism may have other usages as well.

Suggestions/comments welcome.

References
1. /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

setup.sqlapplication/sql; name=setup.sqlDownload

queries.sqlapplication/sql; name=queries.sqlDownload

0004-Local-variables-pointing-to-path-node-used--20230727.patchapplication/x-patch; name=0004-Local-variables-pointing-to-path-node-used--20230727.patchDownload

From 3d24fca1b861741949225b40f6df8892409e8d00 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 15:10:45 +0530
Subject: [PATCH 4/4] Local variables pointing to path node used repeatedly

In match_unsorted_outer() we create a materialize path for inner
relation and pass it to try_nestloop_path repeatedly. Link and unlink
the path to and from the local variable to keep track of it.
---
 src/backend/optimizer/path/joinpath.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index f047ad9ba4..d6560aa6d5 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1762,12 +1762,14 @@ match_unsorted_outer(PlannerInfo *root,
 		/*
 		 * Consider materializing the cheapest inner path, unless
 		 * enable_material is off or the path in question materializes its
-		 * output anyway.
+		 * output anyway. Link the path to a local reference so that it won't be
+		 * freed by add_path if the surrounding nest path is freed by add_path.
+		 * It will get freed if not used later.
 		 */
 		if (enable_material && inner_cheapest_total != NULL &&
 			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
-			matpath = (Path *)
-				create_material_path(innerrel, inner_cheapest_total);
+			link_path(&matpath,
+						(Path *) create_material_path(innerrel, inner_cheapest_total));
 	}
 
 	foreach(lc1, outerrel->pathlist)
@@ -1883,6 +1885,9 @@ match_unsorted_outer(PlannerInfo *root,
 								 false);
 	}
 
+	/* Materialized inner path won't be used anymore. Unlink it */
+	unlink_path(&matpath);
+
 	/*
 	 * Consider partial nestloop and mergejoin plan if outerrel has any
 	 * partial path and the joinrel is parallel-safe.  However, we can't
-- 
2.25.1

0001-Report-memory-used-for-planning-a-query-in--20230727.patchapplication/x-patch; name=0001-Report-memory-used-for-planning-a-query-in--20230727.patchDownload

From 062263008a413dd3561246f2f8793e7674a56fd1 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/4] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  7 ++++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9f859949f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 					planduration;
 		BufferUsage bufusage_start,
 					bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+						- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-					   &planduration, (es->buffers ? &bufusage : NULL));
+					   &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+										(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+					- mem_consumed;
 
 	/* calc differences of buffer counters. */
 	if (es->buffers)
@@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
-						   &planduration, (es->buffers ? &bufusage : NULL));
+						   &planduration, (es->buffers ? &bufusage : NULL),
+						   &mem_consumed);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
 							  paramLI, queryEnv);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..43af271f33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children,
 								 grand_totals.totalspace - grand_totals.freespace)));
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+	MemoryContextCounters grand_totals;
+
+	memset(&grand_totals, 0, sizeof(grand_totals));
+
+	MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false);
+
+	return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *		One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..21e3d2f309 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ExplainState *es, const char *queryString,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
-						   const BufferUsage *bufusage);
+						   const BufferUsage *bufusage,
+						   const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 21640d62a6..d7c477f229 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
 									 bool print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 												bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.25.1

0003-Actual-code-to-use-pathnode-referencing-inf-20230727.patchapplication/x-patch; name=0003-Actual-code-to-use-pathnode-referencing-inf-20230727.patchDownload

From 21b6132417644a07054a85b52e402c31d5c23420 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:57:30 +0530
Subject: [PATCH 3/4] Actual code to use pathnode referencing infrastructure

The commit uses the infrastructure built by the previous commit to references,
unreference and free paths.

The code is not completely mature. There are TODOs.
---
 src/backend/optimizer/util/pathnode.c | 126 +++++++++++++++-----------
 1 file changed, 71 insertions(+), 55 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 09745d19b1..65469637b2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -586,12 +586,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist,
 														  p1);
-
-			/*
-			 * Delete the data pointed-to by the deleted cell, if possible
-			 */
-			if (!IsA(old_path, IndexPath))
-				pfree(old_path);
+			/* TODO: write a routine to unlink a path from the list node and delete the list node  */
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -614,12 +610,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place in pathlist */
 		parent_rel->pathlist =
 			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO: write a function to link_path in a List */
 	}
 	else
 	{
-		/* Reject and recycle the new path */
-		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -822,7 +818,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->partial_pathlist =
 				foreach_delete_current(parent_rel->partial_pathlist, p1);
-			pfree(old_path);
+			/* TODO use routine to unlink a path from the linked list */
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -845,11 +842,13 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place */
 		parent_rel->partial_pathlist =
 			list_insert_nth(parent_rel->partial_pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO create a routine to link path in a list at a given place */
 	}
 	else
 	{
 		/* Reject and recycle the new path */
-		pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -1196,7 +1195,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
 
-	pathnode->bitmapqual = bitmapqual;
+	link_path(&(pathnode->bitmapqual), bitmapqual);
 
 	cost_bitmap_heap_scan(&pathnode->path, root, rel,
 						  pathnode->path.param_info,
@@ -1231,6 +1230,8 @@ create_bitmap_and_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1283,6 +1284,8 @@ create_bitmap_or_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1452,6 +1455,8 @@ create_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: we should use the routine to link path to list */
+		subpath->ref_count++;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1587,6 +1592,9 @@ create_merge_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: use routine which links a path into a list */
+		subpath->ref_count++;
+
 		pathnode->path.rows += subpath->rows;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
@@ -1713,7 +1721,7 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_material(&pathnode->path,
 				  subpath->startup_cost,
@@ -1747,7 +1755,7 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->hash_operators = hash_operators;
 	pathnode->param_exprs = param_exprs;
 	pathnode->singlerow = singlerow;
@@ -1838,7 +1846,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->in_operators = sjinfo->semi_operators;
 	pathnode->uniq_exprs = sjinfo->semi_rhs_exprs;
 
@@ -1859,7 +1867,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = subpath->total_cost;
 		pathnode->path.pathkeys = subpath->pathkeys;
 
-		rel->cheapest_unique_path = (Path *) pathnode;
+		/* TODO: Do we need to make sure that cheapest_unique_path is NULL. */
+		link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 		MemoryContextSwitchTo(oldcontext);
 
@@ -1897,7 +1906,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 				pathnode->path.total_cost = subpath->total_cost;
 				pathnode->path.pathkeys = subpath->pathkeys;
 
-				rel->cheapest_unique_path = (Path *) pathnode;
+				link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 				MemoryContextSwitchTo(oldcontext);
 
@@ -1992,7 +2001,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = sort_path.total_cost;
 	}
 
-	rel->cheapest_unique_path = (Path *) pathnode;
+	link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2023,7 +2032,7 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->path.pathtarget = target ? target : rel->reltarget;
@@ -2114,7 +2123,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = NIL;	/* Gather has unordered result */
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->single_copy = false;
 
@@ -2157,7 +2166,7 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_subqueryscan(pathnode, root, rel, pathnode->path.param_info,
 					  trivial_pathtarget);
@@ -2385,7 +2394,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
@@ -2435,7 +2444,7 @@ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
@@ -2480,7 +2489,7 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
@@ -2610,8 +2619,8 @@ create_nestloop_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 
 	final_cost_nestloop(root, pathnode, workspace, extra);
@@ -2674,8 +2683,8 @@ create_mergejoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_mergeclauses = mergeclauses;
 	pathnode->outersortkeys = outersortkeys;
@@ -2751,8 +2760,8 @@ create_hashjoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = NIL;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_hashclauses = hashclauses;
 	/* final_cost_hashjoin will fill in pathnode->num_batches */
@@ -2793,6 +2802,9 @@ create_projection_path(PlannerInfo *root,
 		Assert(subpp->path.parent == rel);
 		subpath = subpp->subpath;
 		Assert(!IsA(subpath, ProjectionPath));
+
+		/* Free it if not used anywhere else. */
+		unlink_path((Path **) &subpp);
 	}
 
 	pathnode->path.pathtype = T_Result;
@@ -2808,7 +2820,7 @@ create_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2927,21 +2939,21 @@ apply_projection_to_path(PlannerInfo *root,
 		{
 			GatherPath *gpath = (GatherPath *) path;
 
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
+			link_path(&gpath->subpath, 
+						(Path *) create_projection_path(root,
+											   			gpath->subpath->parent,
+											   			gpath->subpath,
+											   			target));
 		}
 		else
 		{
 			GatherMergePath *gmpath = (GatherMergePath *) path;
 
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
+			link_path(&gmpath->subpath, 
+						(Path *) create_projection_path(root,
+															gmpath->subpath->parent,
+															gmpath->subpath,
+															target));
 		}
 	}
 	else if (path->parallel_safe &&
@@ -2990,7 +3002,7 @@ create_set_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order XXX? */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Estimate number of rows produced by SRFs for each row of input; if
@@ -3059,7 +3071,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
@@ -3106,7 +3118,7 @@ create_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_sort(&pathnode->path, root, pathkeys,
 			  subpath->total_cost,
@@ -3152,7 +3164,7 @@ create_group_path(PlannerInfo *root,
 	/* Group doesn't change sort ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->groupClause = groupClause;
 	pathnode->qual = qual;
@@ -3210,7 +3222,7 @@ create_upper_unique_path(PlannerInfo *root,
 	/* Unique doesn't change the input ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->numkeys = numCols;
 
 	/*
@@ -3267,7 +3279,7 @@ create_agg_path(PlannerInfo *root,
 		pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->aggstrategy = aggstrategy;
 	pathnode->aggsplit = aggsplit;
@@ -3330,7 +3342,7 @@ create_groupingsets_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Simplify callers by downgrading AGG_SORTED to AGG_PLAIN, and AGG_MIXED
@@ -3568,7 +3580,7 @@ create_windowagg_path(PlannerInfo *root,
 	/* WindowAgg preserves the input sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->winclause = winclause;
 	pathnode->qual = qual;
 	pathnode->topwindow = topwindow;
@@ -3638,7 +3650,7 @@ create_setop_path(PlannerInfo *root,
 	pathnode->path.pathkeys =
 		(strategy == SETOP_SORTED) ? subpath->pathkeys : NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->cmd = cmd;
 	pathnode->strategy = strategy;
 	pathnode->distinctList = distinctList;
@@ -3697,8 +3709,8 @@ create_recursiveunion_path(PlannerInfo *root,
 	/* RecursiveUnion result is always unsorted */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->leftpath = leftpath;
-	pathnode->rightpath = rightpath;
+	link_path(&pathnode->leftpath, leftpath);
+	link_path(&pathnode->rightpath, rightpath);
 	pathnode->distinctList = distinctList;
 	pathnode->wtParam = wtParam;
 	pathnode->numGroups = numGroups;
@@ -3740,7 +3752,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->rowMarks = rowMarks;
 	pathnode->epqParam = epqParam;
 
@@ -3843,7 +3855,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
 		pathnode->path.pathtarget->width = 0;
 	}
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->operation = operation;
 	pathnode->canSetTag = canSetTag;
 	pathnode->nominalRelation = nominalRelation;
@@ -3901,7 +3913,7 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.startup_cost = subpath->startup_cost;
 	pathnode->path.total_cost = subpath->total_cost;
 	pathnode->path.pathkeys = subpath->pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->limitOffset = limitOffset;
 	pathnode->limitCount = limitCount;
 	pathnode->limitOption = limitOption;
@@ -4182,6 +4194,7 @@ do { \
 	(path) = reparameterize_path_by_child(root, (path), child_rel); \
 	if ((path) == NULL) \
 		return NULL; \
+	link_path(&(path), (path)); \
 } while(0)
 
 #define REPARAMETERIZE_CHILD_PATH_LIST(pathlist) \
@@ -4469,11 +4482,14 @@ reparameterize_pathlist_by_child(PlannerInfo *root,
 
 		if (path == NULL)
 		{
+			/* TODO: unlink paths in the list */
 			list_free(result);
 			return NIL;
 		}
 
+		/* TODO: need a routine to link a path into a linked list */
 		result = lappend(result, path);
+		path->ref_count++;
 	}
 
 	return result;
-- 
2.25.1

0002-Basic-infrastructure-to-link-unlink-and-fre-20230727.patchapplication/x-patch; name=0002-Basic-infrastructure-to-link-unlink-and-fre-20230727.patchDownload

From 38d79202803836613e8e6d366db1650e503de419 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:44:18 +0530
Subject: [PATCH 2/4] Basic infrastructure to link, unlink and free pathnodes

add_path and add_partial_path free path nodes which they deem sub-optimal. But
they just free the uppermost pathnode in the path. The subpaths continue to
occupy memory even if they are not used anywhere. The subpath nodes are not
freed because they may be referenced by other paths. This commit introduces the
infrastructure to track references to paths.

As the path nodes are referenced they are "linked" using link_path() to
referencing objects. When the path nodes are no longer useful they are
"unlinked" using unlink_path() from the referencing objects. The paths whose
references reach 0 during unlinking are freed automatically using
free_path(). The function unlinks the sub path nodes and can be called
when freeing any path node.

When the final path for join tree is chosen the paths linked to each
participating relation are unlinked, thus ultimately getting freed if not used.

TODO
The functions free_path(), unlink_path() and link_path() have ereports
to catch code paths which do not use these function correctly. They call
errbacktrace() which is not used anywhere else. These ereport calls will
need to be fixed for productization.
---
 src/backend/optimizer/path/allpaths.c |  77 +++++++++++++++
 src/backend/optimizer/util/pathnode.c | 136 ++++++++++++++++++++++++++
 src/include/nodes/pathnodes.h         |   1 +
 src/include/optimizer/pathnode.h      |  38 +++++++
 4 files changed, 252 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9bdc70c702..f16fd4747c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3386,6 +3386,81 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 	}
 }
 
+/*
+ * TODO: Need similar code to free paths in upper relations.
+ */
+static void
+free_unused_paths_from_rel(RelOptInfo *rel)
+{
+	ListCell	   *lc_path;
+	int				cnt_part;
+
+	foreach(lc_path, rel->pathlist)
+	{
+		Path *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			/* TODO: use routine to unlink path from list */
+			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/* Do the same for partial pathlist. */
+	foreach(lc_path, rel->partial_pathlist)
+	{
+		Path *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/*
+	 * TODO: We can perform this in generate_partitionwise_paths as well since
+	 * by that time all the paths from partitions will be linked if used.
+	 */
+	
+	/* Free unused paths from the partition relations */
+	for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++)
+	{
+		free_unused_paths_from_rel(rel->part_rels[cnt_part]);
+	}
+}
+
+/*
+ * Free unused paths from all the relations created while building the join tree.
+ */
+static void
+free_unused_paths(PlannerInfo *root, int levels_needed)
+{
+	int		cnt;
+
+	for (cnt = levels_needed - 1; cnt >= 1; cnt--)
+	{
+		ListCell *lc;
+
+		foreach (lc, root->join_rel_level[cnt])
+		{
+			free_unused_paths_from_rel(lfirst(lc));
+		}
+	}
+
+	for (cnt = 0; cnt < root->simple_rel_array_size; cnt++)
+	{
+		RelOptInfo   *rel = root->simple_rel_array[cnt];
+
+		/* Skip empty slots */
+		if (rel != NULL)
+			free_unused_paths_from_rel(rel);
+	}
+}
+
 /*
  * standard_join_search
  *	  Find possible joinpaths for a query by successively finding ways
@@ -3487,6 +3562,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		}
 	}
 
+	free_unused_paths(root, levels_needed);
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 5f5596841c..09745d19b1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -915,6 +915,142 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+free_path(Path *path)
+{
+	/* If the path is still referenced, freeing it would create a dangling pointer. */
+	/* TODO: it could just be an Assert? */
+	if (path->ref_count != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has reference count %d, can not be freed",
+							path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/*
+	 * A path referenced in the parent relation's pathlist can't be freed.
+	 * Ideally such a path should have ref_count >= 1. If this happens it
+	 * indicates that the path was not linked somewhere, and yet unlinked
+	 * (usually by free_path()).
+	 */
+	if (list_member_ptr(path->parent->pathlist, path))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed",
+							path, path->pathtype, path->ref_count)));
+		return;
+	}
+	
+	/* Decrement the reference counts of paths referenced by this one. */
+	switch(path->pathtype)
+	{
+		case T_SeqScan:
+		case T_IndexScan:
+		case T_IndexOnlyScan:
+			/* Simple paths reference no other path. */
+			break;
+
+		case T_MergeJoin:
+		case T_HashJoin:
+		case T_NestLoop:
+			{
+				JoinPath *jpath = (JoinPath *)path;
+
+				unlink_path(&(jpath->outerjoinpath));
+				unlink_path(&(jpath->innerjoinpath));
+			}
+			break;
+
+		case T_Append:
+		case T_MergeAppend:
+			{
+				AppendPath *appath = (AppendPath *)path;
+				ListCell   *lc;
+
+				foreach (lc, appath->subpaths)
+				{
+					Path *subpath = lfirst(lc);
+
+					/* TODO use a routine to unlink path from list. */
+					appath->subpaths = foreach_delete_current(appath->subpaths, lc);
+					unlink_path(&subpath);
+				}
+			}
+			break;
+
+		case T_Gather:
+			{
+				GatherPath *gpath = (GatherPath *) path;
+
+				unlink_path(&(gpath->subpath));
+			}
+			break;
+
+		case T_GatherMerge:
+			{
+				GatherMergePath *gmpath = (GatherMergePath *)path;
+
+				unlink_path(&gmpath->subpath);
+			}
+			break;
+
+		case T_BitmapHeapScan:
+			{
+				BitmapHeapPath *bhpath = (BitmapHeapPath *)path;
+
+				unlink_path(&(bhpath->bitmapqual));
+			}
+			break;
+
+		case T_Material:
+			{
+				MaterialPath *mpath = (MaterialPath *)path;
+
+				unlink_path(&(mpath->subpath));
+			}
+			break;
+
+		case T_Memoize:
+			{
+				MemoizePath *mpath = (MemoizePath *)path;
+
+				unlink_path(&mpath->subpath);
+			}
+			break;
+		
+		case T_Result:
+			{
+				/* All Result paths except ProjectionPath are simple paths without any subpath. */
+				if (IsA(path, ProjectionPath))
+				{
+					ProjectionPath *ppath = (ProjectionPath *) path;
+
+					unlink_path(&ppath->subpath);
+				}
+			}
+			break;
+
+		default:
+			ereport(WARNING,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+						errbacktrace(),
+						errmsg("unrecognized path type %d", path->pathtype)));
+			break;
+	}
+
+	/*
+	 * TODO: add_path does not free IndexPaths, but we do that here. Is there a
+	 * hazard?
+	 */
+	
+	/* Now reclaim the memory. */
+	pfree(path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7ad..2881085892 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1631,6 +1631,7 @@ typedef struct Path
 
 	/* sort ordering of path's output; a List of PathKey nodes; see above */
 	List	   *pathkeys;
+	int			ref_count;
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 001e75b5b7..8df835368d 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -16,6 +16,7 @@
 
 #include "nodes/bitmapset.h"
 #include "nodes/pathnodes.h"
+#include "limits.h"
 
 
 /*
@@ -295,6 +296,43 @@ extern Path *reparameterize_path(PlannerInfo *root, Path *path,
 								 double loop_count);
 extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
 										  RelOptInfo *child_rel);
+extern void free_path(Path *path);
+
+static inline void
+link_path(Path **path_link, Path *path)
+{
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has negative reference count %d",
+							path, path->pathtype, path->ref_count)));
+
+	path->ref_count++;
+	*path_link = path;
+	Assert(path->ref_count > 0 && path->ref_count <= INT_MAX);
+}
+
+static inline void
+unlink_path(Path **path_link)
+{
+	Path	   *path = *path_link;
+
+	/* A path to be unlinked should have been linked before. */
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d had negative reference count %d",
+							path, path->pathtype, path->ref_count)));
+
+	path->ref_count--;
+	*path_link = NULL;
+
+	/* Free path if none is referencing it. */
+	if (path->ref_count == 0)
+		free_path(path);
+}
 
 /*
  * prototypes for relnode.c
-- 
2.25.1

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#1)

Re: Memory consumed by paths during partitionwise join planning

On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Table 1: Join between unpartitioned tables
Number of tables | without patch | with patch | % reduction |
being joined | | | |
--------------------------------------------------------------
2 | 29.0 KiB | 28.9 KiB | 0.66% |
3 | 79.1 KiB | 76.7 KiB | 3.03% |
4 | 208.6 KiB | 198.2 KiB | 4.97% |
5 | 561.6 KiB | 526.3 KiB | 6.28% |

I have mostly just random thoughts and some questions...

Have you done any analysis of the node types that are consuming all
that memory? Are Path and subclasses of Path really that dominant in
this? The memory savings aren't that impressive and it makes me
wonder if this is worth the trouble.

What's the goal of the memory reduction work? If it's for
performance, does this increase performance? Tracking refcounts of
Paths isn't free.

Why do your refcounts start at 1? This seems weird:

+ /* Free the path if none references it. */
+ if (path->ref_count == 1)

Does ref_count really need to be an int? There's a 16-bit hole you
could use between param_info and parallel_aware. I doubt you'd need
32 bits of space for this.

I agree that it would be nice to get rid of the "if (!IsA(old_path,
IndexPath))" hack in add_path() and it seems like this idea could
allow that. However, I think if we were to have something like this
then you'd want all the logic to be neatly contained inside pathnode.c
without adding any burden in other areas to track or check Path
refcounts.

I imagine the patch would be neater if you added some kind of Path
traversal function akin to expression_tree_walker() that allows you to
visit each subpath recursively and run a callback function on each.

David

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: David Rowley (#2)

Re: Memory consumed by paths during partitionwise join planning

On Wed, Nov 29, 2023 at 1:10 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Table 1: Join between unpartitioned tables
Number of tables | without patch | with patch | % reduction |
being joined | | | |
--------------------------------------------------------------
2 | 29.0 KiB | 28.9 KiB | 0.66% |
3 | 79.1 KiB | 76.7 KiB | 3.03% |
4 | 208.6 KiB | 198.2 KiB | 4.97% |
5 | 561.6 KiB | 526.3 KiB | 6.28% |

I have mostly just random thoughts and some questions...

Have you done any analysis of the node types that are consuming all
that memory? Are Path and subclasses of Path really that dominant in
this? The memory savings aren't that impressive and it makes me
wonder if this is worth the trouble.

This thread and patch targets saving memory consumed by Path nodes
i.e. Path and its subclasses. Breakdown of memory consumed by various
nodes can be found in [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com for partitioned table queries.

What's the goal of the memory reduction work? If it's for
performance, does this increase performance? Tracking refcounts of
Paths isn't free.

This memory reduction work is part of work to reduce planner's memory
consumption while planning queries involving partitioned tables with a
large number (thousands) of partitions [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com. The second table (copies
below) in my first email in this thread gives the memory saved by
freeing unused path nodes when planning queries involving partitioned
tables with a thousand partitions each.

Table 2: join between partitioned tables with partitionwise join
enabled (enable_partitionwise_join = true).
Number of tables | without patch | with patch | % reduction |
being joined | | | |
------------------------------
----------------------------------
2 | 40.3 MiB | 40.0 MiB | 0.70% |
3 | 146.9 MiB | 143.1 MiB | 2.55% |
4 | 445.4 MiB | 430.4 MiB | 3.38% |
5 | 1563.3 MiB | 1517.6 MiB | 2.92% |

%wise the memory savings are not great but because of sheer amount of
memory used, the savings are in MBs. Also I don't think I managed to
free all the unused paths for the reasons listed in my original mail.
But I was doubtful whether the work is worth it. So I halted my work.
I see you think that it's not worth it. So one vote against it.

Why do your refcounts start at 1? This seems weird:
+ /* Free the path if none references it. */
+ if (path->ref_count == 1)

Refcounts start from 0. This code is from free_unused_paths_from_rel()
which frees paths in the rel->pathlist. At this stage a path is
referenced from rel->pathlist, hence the count will >= 1 and we should
be freeing paths with refcount > 1 i.e. referenced elsewhere.

Does ref_count really need to be an int? There's a 16-bit hole you
could use between param_info and parallel_aware. I doubt you'd need
32 bits of space for this.

16 bit space allows the refcount to go upto 65535 (or twice of that).
This is somewhere between 18C9 and 19C9. If a (the cheapest) path in a
given relation in 19 relation query is referenced in all the joins
that that relation is part of, this refcount will be exhausted. If we
aren't dealing with queries involving those many relations already, we
will soon. Maybe we should add code to protect from overflows.

I agree that it would be nice to get rid of the "if (!IsA(old_path,
IndexPath))" hack in add_path() and it seems like this idea could
allow that. However, I think if we were to have something like this
then you'd want all the logic to be neatly contained inside pathnode.c
without adding any burden in other areas to track or check Path
refcounts.

The code is in following files
src/backend/optimizer/path/allpaths.c:
The definitions of free_unused_paths_from_rel() and
free_unused_paths() can be moved to pathnode.c

src/backend/optimizer/util/pathnode.c
src/include/nodes/pathnodes.h
src/include/optimizer/pathnode.h
Those three together I consider as pathnode.c. But let me know if you
feel otherwise.

src/backend/optimizer/path/joinpath.c
this is the only place where other than pathnode.c where we use
link_path(). That's required to stop add_path from freeing a
materialized path that is tried multiple times.We can't add_path that
path to rel since it will get kicked out by the path that it's
materialising for. But I think it's still path creation code so should
be ok. Let me know if you feel otherwise.

There may be other places but I can find those out and consider them
case by case basis.

I imagine the patch would be neater if you added some kind of Path
traversal function akin to expression_tree_walker() that allows you to
visit each subpath recursively and run a callback function on each.

Yes. Agreed.

I am fine to work on this further if the community thinks it is
acceptable. It seems from your point of view the worth of this work is
as follows
a. memory savings - not worth it
b. getting rid of !IsA(old_path, IndexPath) - worth it
c. problem of same path being added to multiple relations - not worth
it since you have other solution

Can't judge whether that means it's acceptable or not.

[1]: /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#3)

Re: Memory consumed by paths during partitionwise join planning

On Fri, 1 Dec 2023 at 19:59, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I am fine to work on this further if the community thinks it is
acceptable. It seems from your point of view the worth of this work is
as follows
a. memory savings - not worth it
b. getting rid of !IsA(old_path, IndexPath) - worth it
c. problem of same path being added to multiple relations - not worth
it since you have other solution

Can't judge whether that means it's acceptable or not.

I think it's worth trying to get this working and we can run some
performance tests to see if it's cheap enough to use.

I've not had enough time to fully process your patches, but on a quick
look, I don't quite understand why link_path() does not have to
recursively increment ref_counts in each of its subpaths. It seems
you're doing the opposite in free_path(), so it seems like this would
break for cases like a SortPath on say, a LimitPath over a SeqScan.
When you call create_sort_path you'll only increment the refcount on
the LimitPath. The SeqScan path then has a refcount 1 lower than it
should. If something calls unlink_path on the SeqScan path that could
put the refcount to 0 and break the SortPath by freeing a Path
indirectly referenced by the sort.

Recursively incrementing the refcounts would slow the patch down a
bit, so we'd need to test the performance of this. I think at the
rate we create and free join paths in complex join problems we could
end up bumping refcounts quite often.

Another way to fix the pfree issues was to add infrastructure to
shallow copy paths. We could shallow copy paths whenever we borrow a
Path from another RelOptInfo and then just reset the parent to the new
RelOptInfo. That unfortunately makes your memory issue a bit worse.
We discussed this a bit in [1]/messages/by-id/CAApHDvo8tiB5xbJa94f4Mo8Of2fPJ2zG+zQQQTGr-ThjSsymQQ@mail.gmail.com. It also seems there was some
discussion about wrapping a Path up in a ProjectionPath in there. That
unfortunately also takes us in the opposite direction in terms of your
memory reduction goals.

Maybe we can try to move forward with your refcount idea and see how
the performance looks. If that's intolerable then that might help us
decide on the next best alternative solution.

David

[1]: /messages/by-id/CAApHDvo8tiB5xbJa94f4Mo8Of2fPJ2zG+zQQQTGr-ThjSsymQQ@mail.gmail.com

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: David Rowley (#4)

Re: Memory consumed by paths during partitionwise join planning

On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 1 Dec 2023 at 19:59, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I am fine to work on this further if the community thinks it is
acceptable. It seems from your point of view the worth of this work is
as follows
a. memory savings - not worth it
b. getting rid of !IsA(old_path, IndexPath) - worth it
c. problem of same path being added to multiple relations - not worth
it since you have other solution

Can't judge whether that means it's acceptable or not.

I think it's worth trying to get this working and we can run some
performance tests to see if it's cheap enough to use.

I've not had enough time to fully process your patches, but on a quick
look, I don't quite understand why link_path() does not have to
recursively increment ref_counts in each of its subpaths. It seems
you're doing the opposite in free_path(), so it seems like this would
break for cases like a SortPath on say, a LimitPath over a SeqScan.
When you call create_sort_path you'll only increment the refcount on
the LimitPath. The SeqScan path then has a refcount 1 lower than it
should. If something calls unlink_path on the SeqScan path that could
put the refcount to 0 and break the SortPath by freeing a Path
indirectly referenced by the sort.

Let me explain how linking and unlinking works. You may skip over it
if you are comfortable with this logic already.
refcounts count how many other paths or objects are referencing a
given path. E.g. we have three path chains as follows
1. joinpath_1->joinpath_2->seqpath_1,
2. joinpath_3->joinpath_4->seqpath_1,
3. joinpath_5->joinpath_2->seqpath_1

Please note that this is not full path tree/forest. It is difficult to
describe the whole path forest in text format. But this should be
sufficient to get the gist of how linking and unlinking works.

Let's say all these paths are listed in pathlists of respective
RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
part of the topmost RelOptInfo. Then the refcounts will be as follows
joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
joinpath_4 - 2 (joinpath_3, RelOptInfo)
seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

If joinpath_3 is chosen finally to be converted into a plan,
joinpath_1, joinpath_5 will be unlinked, their refcount drops to 0 and
they will be freed.
unlinking and freeing them reduces refcount of joinpath 2 to 1. When
we will cleanse the pathlist of corresponding RelOptInfo, joinpath_2's
refcount will reduce to 0 and it will be freed.
freeing joinpath_2 will reduce refcount of seqpath to 2 (joinpath_4
and RelOptInfo).
joinpath_3 and joinpath_4 will have their refcounts unchanged.

This will leave the paths joinpath_3, joinpath_4 and seqpath_1 to be
converted to plans. Rest of the paths will be freed.

If we increment the refcount all the way down the path forest, they
won't really be refcounts anymore since they will count indirect
references as well. Then unlink_path also has to traverse the entire
tree resetting refcounts. Both of those are unnecessary. Please note
that it's only free_path which calls unlink_path on the referenced
paths not unlink_path itself.

In your example, the path chain looks like this
sort_path->limit_path->seqpath with refcounts 1, 1, 2 respectively. I
am assuming that seqpath is referenced by pathlist of its RelOptInfo
and limitpath. sortpath is referenced by its RelOptInfo or is chosen
as the final path. limitpath is not referenced in a RelOptInfo but is
referenced in sortpath.

Now the only things that can call unlink_path on seqpath are free_path
on limitpath OR pathlist cleaning of its RelOptInfo. In both the cases
the refcount will correctly report the number of objects referencing
it. So I don't see a problem here. If you can provide me a query that
does this, I will test it.

Recursively incrementing the refcounts would slow the patch down a
bit, so we'd need to test the performance of this. I think at the
rate we create and free join paths in complex join problems we could
end up bumping refcounts quite often.

This won't be necessary as explained above.

Another way to fix the pfree issues was to add infrastructure to
shallow copy paths. We could shallow copy paths whenever we borrow a
Path from another RelOptInfo and then just reset the parent to the new
RelOptInfo. That unfortunately makes your memory issue a bit worse.
We discussed this a bit in [1]. It also seems there was some
discussion about wrapping a Path up in a ProjectionPath in there. That
unfortunately also takes us in the opposite direction in terms of your
memory reduction goals.

Yeah.

Maybe we can try to move forward with your refcount idea and see how
the performance looks. If that's intolerable then that might help us
decide on the next best alternative solution.

I will report performance numbers (planning time) with my current
patchset which has limitation listed in [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com. If performance gets
worse, we will abandon this idea. If not, I will work on completing
the patch. Does that work for you?

[1]: /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#5)

Re: Memory consumed by paths during partitionwise join planning

On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

given path. E.g. we have three path chains as follows
1. joinpath_1->joinpath_2->seqpath_1,
2. joinpath_3->joinpath_4->seqpath_1,
3. joinpath_5->joinpath_2->seqpath_1

Please note that this is not full path tree/forest. It is difficult to
describe the whole path forest in text format. But this should be
sufficient to get the gist of how linking and unlinking works.

Let's say all these paths are listed in pathlists of respective
RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
part of the topmost RelOptInfo. Then the refcounts will be as follows
joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
joinpath_4 - 2 (joinpath_3, RelOptInfo)
seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

I think this likely works fine providing you always unlink paths from
the root of the path tree. When you start unlinking from non-root
paths I think you could start to see problems unless you increment
refcounts recursively.

The problem I had when working on improving the union planner was when
building the final paths for a union child.

Let's say I have the query: SELECT a,b FROM t1 UNION SELECT x,y FROM t2;

When planning the first union child, I'd like to provide the union
planner with a sorted path and also the cheapest scan path on t1 to
allow it to Hash Aggregate on the cheapest path.

Let's say the planner produces the following paths on t1.

SeqScan_t1

When I create the final union child paths I want to try and produce a
few sorted paths to allow MergeAppend to work. I'll loop over each
path in t1's pathlist.

SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path
that to final_rel.

Now, I also want to allow cheap Hash Aggregates to implement the
UNION, so I'll include SeqScan_t1 without sorting it.

If I now do add_path(final_rel, SeqScan_t1), add_path() could see that
the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since
the sort version has better PathKeys, add_path will unlink SeqScan_t1.

Now, I don't think your scheme for refcounting paths is broken by this
because you'll increment the refcount of the SeqScan_t1 when the Sort
uses it as its subpath, but if instead of a Sort->SeqScan that was
IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing
the refcount for the SeqScan when you create the Incremental Sort due
to the Sort being between the two.

So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1)

Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to
0 and we free the path and the SeqScan_t1's refcount goes down by 1
but does not reach 0.

We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the
same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better
pathkeys and we unlink SeqScan_t1. Refcount on SeqScan_t1 gets to 0
and we free the path. We need SeqScan_t1 for the incremental sort.

Perhaps in reality here, since SeqScan_t1 exists in the base rel's
pathlist we'd still have a refcount from that, but I assume at some
point you want to start freeing up paths from RelOptInfos that we're
done with, in which case the SeqScan_t1's refcount would reach 0 at
that point and we'd crash during create plan of the
Sort2->Incremental_Sort->SeqScan_t1 plan.

Have I misunderstood something here? As far as I see it with your
non-recursive refcounts, it's only safe to free from the root of a
pathtree and isn't safe when unlinking paths that are the subpath of
another path unless the unlinking is done from the root.

David

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: David Rowley (#6)

Re: Memory consumed by paths during partitionwise join planning

On Fri, Dec 8, 2023 at 1:02 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

given path. E.g. we have three path chains as follows
1. joinpath_1->joinpath_2->seqpath_1,
2. joinpath_3->joinpath_4->seqpath_1,
3. joinpath_5->joinpath_2->seqpath_1

Please note that this is not full path tree/forest. It is difficult to
describe the whole path forest in text format. But this should be
sufficient to get the gist of how linking and unlinking works.

Let's say all these paths are listed in pathlists of respective
RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
part of the topmost RelOptInfo. Then the refcounts will be as follows
joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
joinpath_4 - 2 (joinpath_3, RelOptInfo)
seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

I think this likely works fine providing you always unlink paths from
the root of the path tree. When you start unlinking from non-root
paths I think you could start to see problems unless you increment
refcounts recursively.

The problem I had when working on improving the union planner was when
building the final paths for a union child.

Let's say I have the query: SELECT a,b FROM t1 UNION SELECT x,y FROM t2;

When planning the first union child, I'd like to provide the union
planner with a sorted path and also the cheapest scan path on t1 to
allow it to Hash Aggregate on the cheapest path.

Let's say the planner produces the following paths on t1.

SeqScan_t1

When I create the final union child paths I want to try and produce a
few sorted paths to allow MergeAppend to work. I'll loop over each
path in t1's pathlist.

SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path
that to final_rel.

Now, I also want to allow cheap Hash Aggregates to implement the
UNION, so I'll include SeqScan_t1 without sorting it.

If I now do add_path(final_rel, SeqScan_t1), add_path() could see that
the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since
the sort version has better PathKeys, add_path will unlink SeqScan_t1.

Now, I don't think your scheme for refcounting paths is broken by this
because you'll increment the refcount of the SeqScan_t1 when the Sort
uses it as its subpath, but if instead of a Sort->SeqScan that was
IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing
the refcount for the SeqScan when you create the Incremental Sort due
to the Sort being between the two.

So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1)

Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to
0 and we free the path and the SeqScan_t1's refcount goes down by 1
but does not reach 0.

What is Sort1? Sort1->Seqscan_t1 OR incremental_sort->Sort->seqscan_t1?

I am getting lost here. But I think you have misunderstood the code so
this question is irrelevant. See next.

We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the
same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better
pathkeys and we unlink SeqScan_t1. Refcount on SeqScan_t1 gets to 0
and we free the path. We need SeqScan_t1 for the incremental sort.

the path being presented to add_path is never passed to unlink_path().
If it's suboptimal to any existing path, it is passed to free_path()
which in its current form will throw an error. But that can fixed.
free_path can just ignore a path with refcount > 0, if we want to
present the same path to add_path multiple times.

Perhaps in reality here, since SeqScan_t1 exists in the base rel's
pathlist we'd still have a refcount from that, but I assume at some
point you want to start freeing up paths from RelOptInfos that we're
done with, in which case the SeqScan_t1's refcount would reach 0 at
that point and we'd crash during create plan of the
Sort2->Incremental_Sort->SeqScan_t1 plan.

Have I misunderstood something here? As far as I see it with your
non-recursive refcounts, it's only safe to free from the root of a
pathtree and isn't safe when unlinking paths that are the subpath of
another path unless the unlinking is done from the root.

As long as we call link_path every time we reference a path, we could
start freeing paths from anywhere. The reference count takes care of
not freeing referenced paths. Of course, I will need to fix
free_path() as above.

--
Best Wishes,
Ashutosh Bapat

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: David Rowley (#4)

Re: Memory consumed by paths during partitionwise join planning

On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

Maybe we can try to move forward with your refcount idea and see how
the performance looks. If that's intolerable then that might help us
decide on the next best alternative solution.

Here are performance numbers

setup

create table t1 (a integer primary key, b integer);
create table t1_parted (a integer primary key, b integer) partition by range(a);
create 1000 partitions of t1

query (a five way self-join)
select * from t1 a, t1 b, t1 c, t1 d, t1 e where a.a = b.a and b.a =
c.a and c.a = d.a and d.a = e.a -- unpartitioned table case
select * from t1_parted a, t1_parted b, t1_parted c, t1_parted d,
t1_parted e where a.a = b.a and b.a = c.a and c.a = d.a and d.a =
e.a; -- partitioned table case

The numbers with patches attached to [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com with limitations listed in
the email are thus

Ran each query 10 times through EXPLAIN (SUMMARY) and averaged
planning time with and without patch.
unpartitioned case
without patch: 0.25
with patch: 0.19
this improvement is probably misleading. The planning time for this
query change a lot.

partitioned case (without partitionwise join)
without patch: 14580.65
with patch: 14625.12
% degradation: 0.3%

partitioned case (with partitionwise join)
without patch: 23273.69
with patch: 23328.52
% degradation: 0.2%

That looks pretty small considering the benefits. What do you think?

[1]: /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#8)

4 attachment(s)

Re: Memory consumed by paths during partitionwise join planning

Forgot to mention,

On Thu, Dec 14, 2023 at 5:34 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

Maybe we can try to move forward with your refcount idea and see how
the performance looks. If that's intolerable then that might help us
decide on the next best alternative solution.

Here are performance numbers

setup

create table t1 (a integer primary key, b integer);
create table t1_parted (a integer primary key, b integer) partition by range(a);
create 1000 partitions of t1

query (a five way self-join)
select * from t1 a, t1 b, t1 c, t1 d, t1 e where a.a = b.a and b.a =
c.a and c.a = d.a and d.a = e.a -- unpartitioned table case
select * from t1_parted a, t1_parted b, t1_parted c, t1_parted d,
t1_parted e where a.a = b.a and b.a = c.a and c.a = d.a and d.a =
e.a; -- partitioned table case

The numbers with patches attached to [1] with limitations listed in
the email are thus

Ran each query 10 times through EXPLAIN (SUMMARY) and averaged
planning time with and without patch.
unpartitioned case
without patch: 0.25
with patch: 0.19
this improvement is probably misleading. The planning time for this
query change a lot.

partitioned case (without partitionwise join)
without patch: 14580.65
with patch: 14625.12
% degradation: 0.3%

partitioned case (with partitionwise join)
without patch: 23273.69
with patch: 23328.52
% degradation: 0.2%

That looks pretty small considering the benefits. What do you think?

[1] /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

If you want to experiment, please use attached patches. There's a fix
for segfault during initdb in them. The patches are still raw.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Basic-infrastructure-to-link-unlink-and-fre-20231215.patchtext/x-patch; charset=US-ASCII; name=0002-Basic-infrastructure-to-link-unlink-and-fre-20231215.patchDownload

From abcd735ca251f4832125aa38cee6d03743dcf8bb Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:44:18 +0530
Subject: [PATCH 2/4] Basic infrastructure to link, unlink and free pathnodes

add_path and add_partial_path free path nodes which they deem sub-optimal. But
they just free the uppermost pathnode in the path. The subpaths continue to
occupy memory even if they are not used anywhere. The subpath nodes are not
freed because they may be referenced by other paths. This commit introduces the
infrastructure to track references to paths.

As the path nodes are referenced they are "linked" using link_path() to
referencing objects. When the path nodes are no longer useful they are
"unlinked" using unlink_path() from the referencing objects. The paths whose
references reach 0 during unlinking are freed automatically using
free_path(). The function unlinks the sub path nodes and can be called
when freeing any path node.

When the final path for join tree is chosen the paths linked to each
participating relation are unlinked, thus ultimately getting freed if not used.

TODO
The functions free_path(), unlink_path() and link_path() have ereports
to catch code paths which do not use these function correctly. They call
errbacktrace() which is not used anywhere else. These ereport calls will
need to be fixed for productization.
---
 src/backend/optimizer/path/allpaths.c |  77 +++++++++++++++
 src/backend/optimizer/util/pathnode.c | 136 ++++++++++++++++++++++++++
 src/include/nodes/pathnodes.h         |   1 +
 src/include/optimizer/pathnode.h      |  38 +++++++
 4 files changed, 252 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 67921a0826..b7098ac39e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3388,6 +3388,81 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 	}
 }
 
+/*
+ * TODO: Need similar code to free paths in upper relations.
+ */
+static void
+free_unused_paths_from_rel(RelOptInfo *rel)
+{
+	ListCell	   *lc_path;
+	int				cnt_part;
+
+	foreach(lc_path, rel->pathlist)
+	{
+		Path *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			/* TODO: use routine to unlink path from list */
+			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/* Do the same for partial pathlist. */
+	foreach(lc_path, rel->partial_pathlist)
+	{
+		Path *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/*
+	 * TODO: We can perform this in generate_partitionwise_paths as well since
+	 * by that time all the paths from partitions will be linked if used.
+	 */
+	
+	/* Free unused paths from the partition relations */
+	for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++)
+	{
+		free_unused_paths_from_rel(rel->part_rels[cnt_part]);
+	}
+}
+
+/*
+ * Free unused paths from all the relations created while building the join tree.
+ */
+static void
+free_unused_paths(PlannerInfo *root, int levels_needed)
+{
+	int		cnt;
+
+	for (cnt = levels_needed - 1; cnt >= 1; cnt--)
+	{
+		ListCell *lc;
+
+		foreach (lc, root->join_rel_level[cnt])
+		{
+			free_unused_paths_from_rel(lfirst(lc));
+		}
+	}
+
+	for (cnt = 0; cnt < root->simple_rel_array_size; cnt++)
+	{
+		RelOptInfo   *rel = root->simple_rel_array[cnt];
+
+		/* Skip empty slots */
+		if (rel != NULL)
+			free_unused_paths_from_rel(rel);
+	}
+}
+
 /*
  * standard_join_search
  *	  Find possible joinpaths for a query by successively finding ways
@@ -3490,6 +3565,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		}
 	}
 
+	free_unused_paths(root, levels_needed);
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..e00eb0e19b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -915,6 +915,142 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+free_path(Path *path)
+{
+	/* If the path is still referenced, freeing it would create a dangling pointer. */
+	/* TODO: it could just be an Assert? */
+	if (path->ref_count != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has reference count %d, can not be freed",
+							path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/*
+	 * A path referenced in the parent relation's pathlist can't be freed.
+	 * Ideally such a path should have ref_count >= 1. If this happens it
+	 * indicates that the path was not linked somewhere, and yet unlinked
+	 * (usually by free_path()).
+	 */
+	if (list_member_ptr(path->parent->pathlist, path))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed",
+							path, path->pathtype, path->ref_count)));
+		return;
+	}
+	
+	/* Decrement the reference counts of paths referenced by this one. */
+	switch(path->pathtype)
+	{
+		case T_SeqScan:
+		case T_IndexScan:
+		case T_IndexOnlyScan:
+			/* Simple paths reference no other path. */
+			break;
+
+		case T_MergeJoin:
+		case T_HashJoin:
+		case T_NestLoop:
+			{
+				JoinPath *jpath = (JoinPath *)path;
+
+				unlink_path(&(jpath->outerjoinpath));
+				unlink_path(&(jpath->innerjoinpath));
+			}
+			break;
+
+		case T_Append:
+		case T_MergeAppend:
+			{
+				AppendPath *appath = (AppendPath *)path;
+				ListCell   *lc;
+
+				foreach (lc, appath->subpaths)
+				{
+					Path *subpath = lfirst(lc);
+
+					/* TODO use a routine to unlink path from list. */
+					appath->subpaths = foreach_delete_current(appath->subpaths, lc);
+					unlink_path(&subpath);
+				}
+			}
+			break;
+
+		case T_Gather:
+			{
+				GatherPath *gpath = (GatherPath *) path;
+
+				unlink_path(&(gpath->subpath));
+			}
+			break;
+
+		case T_GatherMerge:
+			{
+				GatherMergePath *gmpath = (GatherMergePath *)path;
+
+				unlink_path(&gmpath->subpath);
+			}
+			break;
+
+		case T_BitmapHeapScan:
+			{
+				BitmapHeapPath *bhpath = (BitmapHeapPath *)path;
+
+				unlink_path(&(bhpath->bitmapqual));
+			}
+			break;
+
+		case T_Material:
+			{
+				MaterialPath *mpath = (MaterialPath *)path;
+
+				unlink_path(&(mpath->subpath));
+			}
+			break;
+
+		case T_Memoize:
+			{
+				MemoizePath *mpath = (MemoizePath *)path;
+
+				unlink_path(&mpath->subpath);
+			}
+			break;
+		
+		case T_Result:
+			{
+				/* All Result paths except ProjectionPath are simple paths without any subpath. */
+				if (IsA(path, ProjectionPath))
+				{
+					ProjectionPath *ppath = (ProjectionPath *) path;
+
+					unlink_path(&ppath->subpath);
+				}
+			}
+			break;
+
+		default:
+			ereport(WARNING,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+						errbacktrace(),
+						errmsg("unrecognized path type %d", path->pathtype)));
+			break;
+	}
+
+	/*
+	 * TODO: add_path does not free IndexPaths, but we do that here. Is there a
+	 * hazard?
+	 */
+	
+	/* Now reclaim the memory. */
+	pfree(path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ed85dc7414..24b01028ec 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1631,6 +1631,7 @@ typedef struct Path
 
 	/* sort ordering of path's output; a List of PathKey nodes; see above */
 	List	   *pathkeys;
+	int			ref_count;
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 6e557bebc4..8990c6c3e0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -16,6 +16,7 @@
 
 #include "nodes/bitmapset.h"
 #include "nodes/pathnodes.h"
+#include "limits.h"
 
 
 /*
@@ -298,6 +299,43 @@ extern Path *reparameterize_path(PlannerInfo *root, Path *path,
 								 double loop_count);
 extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
 										  RelOptInfo *child_rel);
+extern void free_path(Path *path);
+
+static inline void
+link_path(Path **path_link, Path *path)
+{
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d has negative reference count %d",
+							path, path->pathtype, path->ref_count)));
+
+	path->ref_count++;
+	*path_link = path;
+	Assert(path->ref_count > 0 && path->ref_count <= INT_MAX);
+}
+
+static inline void
+unlink_path(Path **path_link)
+{
+	Path	   *path = *path_link;
+
+	/* A path to be unlinked should have been linked before. */
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+					errbacktrace(),
+					errmsg("path node %p of type %d had negative reference count %d",
+							path, path->pathtype, path->ref_count)));
+
+	path->ref_count--;
+	*path_link = NULL;
+
+	/* Free path if none is referencing it. */
+	if (path->ref_count == 0)
+		free_path(path);
+}
 
 /*
  * prototypes for relnode.c
-- 
2.25.1

0001-Report-memory-used-for-planning-a-query-in--20231215.patchtext/x-patch; charset=US-ASCII; name=0001-Report-memory-used-for-planning-a-query-in--20231215.patchDownload

From 6d8ab0a1239fd4fcfecf99934d9c826e47f5392c Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/4] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  7 ++++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..e114fdc44e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 					planduration;
 		BufferUsage bufusage_start,
 					bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+						- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-					   &planduration, (es->buffers ? &bufusage : NULL));
+					   &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+										(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+					- mem_consumed;
 
 	/* calc differences of buffer counters. */
 	if (es->buffers)
@@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
-						   &planduration, (es->buffers ? &bufusage : NULL));
+						   &planduration, (es->buffers ? &bufusage : NULL),
+						   &mem_consumed);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
 							  paramLI, queryEnv);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..43af271f33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children,
 								 grand_totals.totalspace - grand_totals.freespace)));
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+	MemoryContextCounters grand_totals;
+
+	memset(&grand_totals, 0, sizeof(grand_totals));
+
+	MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false);
+
+	return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *		One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index f9525fb572..381476836e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ExplainState *es, const char *queryString,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
-						   const BufferUsage *bufusage);
+						   const BufferUsage *bufusage,
+						   const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index d14e8546a6..8b616c0e66 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -89,6 +89,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
 									 bool print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 												bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.25.1

0003-Actual-code-to-use-pathnode-referencing-inf-20231215.patchtext/x-patch; charset=US-ASCII; name=0003-Actual-code-to-use-pathnode-referencing-inf-20231215.patchDownload

From 0c7c4c4634c40150f180ba2b8cddc70a507f0442 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:57:30 +0530
Subject: [PATCH 3/4] Actual code to use pathnode referencing infrastructure

The commit uses the infrastructure built by the previous commit to references,
unreference and free paths.

The code is not completely mature. There are TODOs.
---
 src/backend/optimizer/util/pathnode.c | 126 +++++++++++++++-----------
 1 file changed, 71 insertions(+), 55 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e00eb0e19b..0d5ab390c4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -586,12 +586,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist,
 														  p1);
-
-			/*
-			 * Delete the data pointed-to by the deleted cell, if possible
-			 */
-			if (!IsA(old_path, IndexPath))
-				pfree(old_path);
+			/* TODO: write a routine to unlink a path from the list node and delete the list node  */
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -614,12 +610,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place in pathlist */
 		parent_rel->pathlist =
 			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO: write a function to link_path in a List */
 	}
 	else
 	{
-		/* Reject and recycle the new path */
-		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -822,7 +818,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->partial_pathlist =
 				foreach_delete_current(parent_rel->partial_pathlist, p1);
-			pfree(old_path);
+			/* TODO use routine to unlink a path from the linked list */
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -845,11 +842,13 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place */
 		parent_rel->partial_pathlist =
 			list_insert_nth(parent_rel->partial_pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO create a routine to link path in a list at a given place */
 	}
 	else
 	{
 		/* Reject and recycle the new path */
-		pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -1196,7 +1195,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
 
-	pathnode->bitmapqual = bitmapqual;
+	link_path(&(pathnode->bitmapqual), bitmapqual);
 
 	cost_bitmap_heap_scan(&pathnode->path, root, rel,
 						  pathnode->path.param_info,
@@ -1231,6 +1230,8 @@ create_bitmap_and_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1283,6 +1284,8 @@ create_bitmap_or_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1452,6 +1455,8 @@ create_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: we should use the routine to link path to list */
+		subpath->ref_count++;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1587,6 +1592,9 @@ create_merge_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: use routine which links a path into a list */
+		subpath->ref_count++;
+
 		pathnode->path.rows += subpath->rows;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
@@ -1713,7 +1721,7 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_material(&pathnode->path,
 				  subpath->startup_cost,
@@ -1747,7 +1755,7 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->hash_operators = hash_operators;
 	pathnode->param_exprs = param_exprs;
 	pathnode->singlerow = singlerow;
@@ -1838,7 +1846,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->in_operators = sjinfo->semi_operators;
 	pathnode->uniq_exprs = sjinfo->semi_rhs_exprs;
 
@@ -1859,7 +1867,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = subpath->total_cost;
 		pathnode->path.pathkeys = subpath->pathkeys;
 
-		rel->cheapest_unique_path = (Path *) pathnode;
+		/* TODO: Do we need to make sure that cheapest_unique_path is NULL. */
+		link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 		MemoryContextSwitchTo(oldcontext);
 
@@ -1897,7 +1906,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 				pathnode->path.total_cost = subpath->total_cost;
 				pathnode->path.pathkeys = subpath->pathkeys;
 
-				rel->cheapest_unique_path = (Path *) pathnode;
+				link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 				MemoryContextSwitchTo(oldcontext);
 
@@ -1992,7 +2001,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = sort_path.total_cost;
 	}
 
-	rel->cheapest_unique_path = (Path *) pathnode;
+	link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2023,7 +2032,7 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->path.pathtarget = target ? target : rel->reltarget;
@@ -2114,7 +2123,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = NIL;	/* Gather has unordered result */
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->single_copy = false;
 
@@ -2157,7 +2166,7 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_subqueryscan(pathnode, root, rel, pathnode->path.param_info,
 					  trivial_pathtarget);
@@ -2386,7 +2395,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2438,7 +2447,7 @@ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2485,7 +2494,7 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2616,8 +2625,8 @@ create_nestloop_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 
 	final_cost_nestloop(root, pathnode, workspace, extra);
@@ -2680,8 +2689,8 @@ create_mergejoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_mergeclauses = mergeclauses;
 	pathnode->outersortkeys = outersortkeys;
@@ -2757,8 +2766,8 @@ create_hashjoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = NIL;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_hashclauses = hashclauses;
 	/* final_cost_hashjoin will fill in pathnode->num_batches */
@@ -2799,6 +2808,9 @@ create_projection_path(PlannerInfo *root,
 		Assert(subpp->path.parent == rel);
 		subpath = subpp->subpath;
 		Assert(!IsA(subpath, ProjectionPath));
+
+		/* Free it if not used anywhere else. */
+		unlink_path((Path **) &subpp);
 	}
 
 	pathnode->path.pathtype = T_Result;
@@ -2814,7 +2826,7 @@ create_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2933,21 +2945,21 @@ apply_projection_to_path(PlannerInfo *root,
 		{
 			GatherPath *gpath = (GatherPath *) path;
 
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
+			link_path(&gpath->subpath, 
+						(Path *) create_projection_path(root,
+											   			gpath->subpath->parent,
+											   			gpath->subpath,
+											   			target));
 		}
 		else
 		{
 			GatherMergePath *gmpath = (GatherMergePath *) path;
 
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
+			link_path(&gmpath->subpath, 
+						(Path *) create_projection_path(root,
+															gmpath->subpath->parent,
+															gmpath->subpath,
+															target));
 		}
 	}
 	else if (path->parallel_safe &&
@@ -2996,7 +3008,7 @@ create_set_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order XXX? */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Estimate number of rows produced by SRFs for each row of input; if
@@ -3065,7 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
@@ -3112,7 +3124,7 @@ create_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_sort(&pathnode->path, root, pathkeys,
 			  subpath->total_cost,
@@ -3158,7 +3170,7 @@ create_group_path(PlannerInfo *root,
 	/* Group doesn't change sort ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->groupClause = groupClause;
 	pathnode->qual = qual;
@@ -3216,7 +3228,7 @@ create_upper_unique_path(PlannerInfo *root,
 	/* Unique doesn't change the input ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->numkeys = numCols;
 
 	/*
@@ -3289,7 +3301,7 @@ create_agg_path(PlannerInfo *root,
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->aggstrategy = aggstrategy;
 	pathnode->aggsplit = aggsplit;
@@ -3352,7 +3364,7 @@ create_groupingsets_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Simplify callers by downgrading AGG_SORTED to AGG_PLAIN, and AGG_MIXED
@@ -3602,7 +3614,7 @@ create_windowagg_path(PlannerInfo *root,
 	/* WindowAgg preserves the input sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->winclause = winclause;
 	pathnode->qual = qual;
 	pathnode->topwindow = topwindow;
@@ -3671,7 +3683,7 @@ create_setop_path(PlannerInfo *root,
 	pathnode->path.pathkeys =
 		(strategy == SETOP_SORTED) ? subpath->pathkeys : NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->cmd = cmd;
 	pathnode->strategy = strategy;
 	pathnode->distinctList = distinctList;
@@ -3730,8 +3742,8 @@ create_recursiveunion_path(PlannerInfo *root,
 	/* RecursiveUnion result is always unsorted */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->leftpath = leftpath;
-	pathnode->rightpath = rightpath;
+	link_path(&pathnode->leftpath, leftpath);
+	link_path(&pathnode->rightpath, rightpath);
 	pathnode->distinctList = distinctList;
 	pathnode->wtParam = wtParam;
 	pathnode->numGroups = numGroups;
@@ -3773,7 +3785,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->rowMarks = rowMarks;
 	pathnode->epqParam = epqParam;
 
@@ -3876,7 +3888,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
 		pathnode->path.pathtarget->width = 0;
 	}
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->operation = operation;
 	pathnode->canSetTag = canSetTag;
 	pathnode->nominalRelation = nominalRelation;
@@ -3934,7 +3946,7 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.startup_cost = subpath->startup_cost;
 	pathnode->path.total_cost = subpath->total_cost;
 	pathnode->path.pathkeys = subpath->pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->limitOffset = limitOffset;
 	pathnode->limitCount = limitCount;
 	pathnode->limitOption = limitOption;
@@ -4215,6 +4227,7 @@ do { \
 	(path) = reparameterize_path_by_child(root, (path), child_rel); \
 	if ((path) == NULL) \
 		return NULL; \
+	link_path(&(path), (path)); \
 } while(0)
 
 #define REPARAMETERIZE_CHILD_PATH_LIST(pathlist) \
@@ -4506,11 +4519,14 @@ reparameterize_pathlist_by_child(PlannerInfo *root,
 
 		if (path == NULL)
 		{
+			/* TODO: unlink paths in the list */
 			list_free(result);
 			return NIL;
 		}
 
+		/* TODO: need a routine to link a path into a linked list */
 		result = lappend(result, path);
+		path->ref_count++;
 	}
 
 	return result;
-- 
2.25.1

0004-Local-variables-pointing-to-path-node-used--20231215.patchtext/x-patch; charset=US-ASCII; name=0004-Local-variables-pointing-to-path-node-used--20231215.patchDownload

From 057a5aed342ea7eafcde88543171dab57dd9e1b6 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 15:10:45 +0530
Subject: [PATCH 4/4] Local variables pointing to path node used repeatedly

In match_unsorted_outer() we create a materialize path for inner
relation and pass it to try_nestloop_path repeatedly. Link and unlink
the path to and from the local variable to keep track of it.
---
 src/backend/optimizer/path/joinpath.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index b2916ad690..696568a3f3 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1782,12 +1782,14 @@ match_unsorted_outer(PlannerInfo *root,
 		/*
 		 * Consider materializing the cheapest inner path, unless
 		 * enable_material is off or the path in question materializes its
-		 * output anyway.
+		 * output anyway. Link the path to a local reference so that it won't be
+		 * freed by add_path if the surrounding nest path is freed by add_path.
+		 * It will get freed if not used later.
 		 */
 		if (enable_material && inner_cheapest_total != NULL &&
 			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
-			matpath = (Path *)
-				create_material_path(innerrel, inner_cheapest_total);
+			link_path(&matpath,
+						(Path *) create_material_path(innerrel, inner_cheapest_total));
 	}
 
 	foreach(lc1, outerrel->pathlist)
@@ -1903,6 +1905,10 @@ match_unsorted_outer(PlannerInfo *root,
 								 false);
 	}
 
+	/* Materialized inner path won't be used anymore. Unlink it */
+	if (matpath)
+		unlink_path(&matpath);
+
 	/*
 	 * Consider partial nestloop and mergejoin plan if outerrel has any
 	 * partial path and the joinrel is parallel-safe.  However, we can't
-- 
2.25.1

#10

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Ashutosh Bapat (#9)

3 attachment(s)

Re: Memory consumed by paths during partitionwise join planning

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

That looks pretty small considering the benefits. What do you think?

[1] /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

If you want to experiment, please use attached patches. There's a fix
for segfault during initdb in them. The patches are still raw.

First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Basic-infrastructure-to-link-unlink-and-fre-20240206.patchtext/x-patch; charset=US-ASCII; name=0001-Basic-infrastructure-to-link-unlink-and-fre-20240206.patchDownload

From 0a80eea476b696ad980a0a63858e2658a1b5b0de Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:44:18 +0530
Subject: [PATCH 1/3] Basic infrastructure to link, unlink and free pathnodes

add_path and add_partial_path free path nodes which they deem sub-optimal. But
they just free the uppermost pathnode in the path. The subpaths continue to
occupy memory even if they are not used anywhere. The subpath nodes are not
freed because they may be referenced by other paths. This commit introduces the
infrastructure to track references to paths.

As the path nodes are referenced they are "linked" using link_path() to
referencing objects. When the path nodes are no longer useful they are
"unlinked" using unlink_path() from the referencing objects. The paths whose
references reach 0 during unlinking are freed automatically using
free_path(). The function unlinks the sub path nodes and can be called
when freeing any path node.

When the final path for join tree is chosen the paths linked to each
participating relation are unlinked, thus ultimately getting freed if not used.

TODO
The functions free_path(), unlink_path() and link_path() have ereports
to catch code paths which do not use these function correctly. They call
errbacktrace() which is not used anywhere else. These ereport calls will
need to be fixed for productization.
---
 src/backend/optimizer/path/allpaths.c |  77 ++++++++++++++
 src/backend/optimizer/util/pathnode.c | 142 ++++++++++++++++++++++++++
 src/include/nodes/pathnodes.h         |   1 +
 src/include/optimizer/pathnode.h      |  38 +++++++
 4 files changed, 258 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84c4de488a..9c02f8adb5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3388,6 +3388,81 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 	}
 }
 
+/*
+ * TODO: Need similar code to free paths in upper relations.
+ */
+static void
+free_unused_paths_from_rel(RelOptInfo *rel)
+{
+	ListCell   *lc_path;
+	int			cnt_part;
+
+	foreach(lc_path, rel->pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			/* TODO: use routine to unlink path from list */
+			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/* Do the same for partial pathlist. */
+	foreach(lc_path, rel->partial_pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/*
+	 * TODO: We can perform this in generate_partitionwise_paths as well since
+	 * by that time all the paths from partitions will be linked if used.
+	 */
+
+	/* Free unused paths from the partition relations */
+	for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++)
+	{
+		free_unused_paths_from_rel(rel->part_rels[cnt_part]);
+	}
+}
+
+/*
+ * Free unused paths from all the relations created while building the join tree.
+ */
+static void
+free_unused_paths(PlannerInfo *root, int levels_needed)
+{
+	int			cnt;
+
+	for (cnt = levels_needed - 1; cnt >= 1; cnt--)
+	{
+		ListCell   *lc;
+
+		foreach(lc, root->join_rel_level[cnt])
+		{
+			free_unused_paths_from_rel(lfirst(lc));
+		}
+	}
+
+	for (cnt = 0; cnt < root->simple_rel_array_size; cnt++)
+	{
+		RelOptInfo *rel = root->simple_rel_array[cnt];
+
+		/* Skip empty slots */
+		if (rel != NULL)
+			free_unused_paths_from_rel(rel);
+	}
+}
+
 /*
  * standard_join_search
  *	  Find possible joinpaths for a query by successively finding ways
@@ -3490,6 +3565,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		}
 	}
 
+	free_unused_paths(root, levels_needed);
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2e1ec41a54..084d48b66b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -915,6 +915,148 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+free_path(Path *path)
+{
+	/*
+	 * If the path is still referenced, freeing it would create a dangling
+	 * pointer.
+	 */
+	/* TODO: it could just be an Assert? */
+	if (path->ref_count != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has reference count %d, can not be freed",
+						path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/*
+	 * A path referenced in the parent relation's pathlist can't be freed.
+	 * Ideally such a path should have ref_count >= 1. If this happens it
+	 * indicates that the path was not linked somewhere, and yet unlinked
+	 * (usually by free_path()).
+	 */
+	if (list_member_ptr(path->parent->pathlist, path))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed",
+						path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/* Decrement the reference counts of paths referenced by this one. */
+	switch (path->pathtype)
+	{
+		case T_SeqScan:
+		case T_IndexScan:
+		case T_IndexOnlyScan:
+			/* Simple paths reference no other path. */
+			break;
+
+		case T_MergeJoin:
+		case T_HashJoin:
+		case T_NestLoop:
+			{
+				JoinPath   *jpath = (JoinPath *) path;
+
+				unlink_path(&(jpath->outerjoinpath));
+				unlink_path(&(jpath->innerjoinpath));
+			}
+			break;
+
+		case T_Append:
+		case T_MergeAppend:
+			{
+				AppendPath *appath = (AppendPath *) path;
+				ListCell   *lc;
+
+				foreach(lc, appath->subpaths)
+				{
+					Path	   *subpath = lfirst(lc);
+
+					/* TODO use a routine to unlink path from list. */
+					appath->subpaths = foreach_delete_current(appath->subpaths, lc);
+					unlink_path(&subpath);
+				}
+			}
+			break;
+
+		case T_Gather:
+			{
+				GatherPath *gpath = (GatherPath *) path;
+
+				unlink_path(&(gpath->subpath));
+			}
+			break;
+
+		case T_GatherMerge:
+			{
+				GatherMergePath *gmpath = (GatherMergePath *) path;
+
+				unlink_path(&gmpath->subpath);
+			}
+			break;
+
+		case T_BitmapHeapScan:
+			{
+				BitmapHeapPath *bhpath = (BitmapHeapPath *) path;
+
+				unlink_path(&(bhpath->bitmapqual));
+			}
+			break;
+
+		case T_Material:
+			{
+				MaterialPath *mpath = (MaterialPath *) path;
+
+				unlink_path(&(mpath->subpath));
+			}
+			break;
+
+		case T_Memoize:
+			{
+				MemoizePath *mpath = (MemoizePath *) path;
+
+				unlink_path(&mpath->subpath);
+			}
+			break;
+
+		case T_Result:
+			{
+				/*
+				 * All Result paths except ProjectionPath are simple paths
+				 * without any subpath.
+				 */
+				if (IsA(path, ProjectionPath))
+				{
+					ProjectionPath *ppath = (ProjectionPath *) path;
+
+					unlink_path(&ppath->subpath);
+				}
+			}
+			break;
+
+		default:
+			ereport(WARNING,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errbacktrace(),
+					 errmsg("unrecognized path type %d", path->pathtype)));
+			break;
+	}
+
+	/*
+	 * TODO: add_path does not free IndexPaths, but we do that here. Is there
+	 * a hazard?
+	 */
+
+	/* Now reclaim the memory. */
+	pfree(path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..48a8176e19 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1643,6 +1643,7 @@ typedef struct Path
 
 	/* sort ordering of path's output; a List of PathKey nodes; see above */
 	List	   *pathkeys;
+	int			ref_count;
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..90bff53dda 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -16,6 +16,7 @@
 
 #include "nodes/bitmapset.h"
 #include "nodes/pathnodes.h"
+#include "limits.h"
 
 
 /*
@@ -298,6 +299,43 @@ extern Path *reparameterize_path(PlannerInfo *root, Path *path,
 								 double loop_count);
 extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
 										  RelOptInfo *child_rel);
+extern void free_path(Path *path);
+
+static inline void
+link_path(Path **path_link, Path *path)
+{
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has negative reference count %d",
+						path, path->pathtype, path->ref_count)));
+
+	path->ref_count++;
+	*path_link = path;
+	Assert(path->ref_count > 0 && path->ref_count <= INT_MAX);
+}
+
+static inline void
+unlink_path(Path **path_link)
+{
+	Path	   *path = *path_link;
+
+	/* A path to be unlinked should have been linked before. */
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d had negative reference count %d",
+						path, path->pathtype, path->ref_count)));
+
+	path->ref_count--;
+	*path_link = NULL;
+
+	/* Free path if none is referencing it. */
+	if (path->ref_count == 0)
+		free_path(path);
+}
 
 /*
  * prototypes for relnode.c
-- 
2.25.1

0002-Actual-code-to-use-pathnode-referencing-inf-20240206.patchtext/x-patch; charset=US-ASCII; name=0002-Actual-code-to-use-pathnode-referencing-inf-20240206.patchDownload

From 6c031c4f7130ca2cb3cbc6a599e1b7f35bfd2703 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 10:57:30 +0530
Subject: [PATCH 2/3] Actual code to use pathnode referencing infrastructure

The commit uses the infrastructure built by the previous commit to references,
unreference and free paths.

The code is not completely mature. There are TODOs.
---
 src/backend/optimizer/util/pathnode.c | 124 +++++++++++++++-----------
 1 file changed, 72 insertions(+), 52 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 084d48b66b..37fc0b7a2b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -588,10 +588,10 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 														  p1);
 
 			/*
-			 * Delete the data pointed-to by the deleted cell, if possible
+			 * TODO: write a routine to unlink a path from the list node and
+			 * delete the list node
 			 */
-			if (!IsA(old_path, IndexPath))
-				pfree(old_path);
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -614,12 +614,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place in pathlist */
 		parent_rel->pathlist =
 			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO: write a function to link_path in a List */
 	}
 	else
 	{
-		/* Reject and recycle the new path */
-		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -822,7 +822,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->partial_pathlist =
 				foreach_delete_current(parent_rel->partial_pathlist, p1);
-			pfree(old_path);
+			/* TODO use routine to unlink a path from the linked list */
+			unlink_path(&old_path);
 		}
 		else
 		{
@@ -845,11 +846,13 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place */
 		parent_rel->partial_pathlist =
 			list_insert_nth(parent_rel->partial_pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO create a routine to link path in a list at a given place */
 	}
 	else
 	{
 		/* Reject and recycle the new path */
-		pfree(new_path);
+		free_path(new_path);
 	}
 }
 
@@ -1202,7 +1205,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
 
-	pathnode->bitmapqual = bitmapqual;
+	link_path(&(pathnode->bitmapqual), bitmapqual);
 
 	cost_bitmap_heap_scan(&pathnode->path, root, rel,
 						  pathnode->path.param_info,
@@ -1237,6 +1240,8 @@ create_bitmap_and_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1289,6 +1294,8 @@ create_bitmap_or_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1458,6 +1465,8 @@ create_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: we should use the routine to link path to list */
+		subpath->ref_count++;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1593,6 +1602,9 @@ create_merge_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: use routine which links a path into a list */
+		subpath->ref_count++;
+
 		pathnode->path.rows += subpath->rows;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
@@ -1719,7 +1731,7 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_material(&pathnode->path,
 				  subpath->startup_cost,
@@ -1753,7 +1765,7 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->hash_operators = hash_operators;
 	pathnode->param_exprs = param_exprs;
 	pathnode->singlerow = singlerow;
@@ -1844,7 +1856,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->in_operators = sjinfo->semi_operators;
 	pathnode->uniq_exprs = sjinfo->semi_rhs_exprs;
 
@@ -1865,7 +1877,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = subpath->total_cost;
 		pathnode->path.pathkeys = subpath->pathkeys;
 
-		rel->cheapest_unique_path = (Path *) pathnode;
+		/* TODO: Do we need to make sure that cheapest_unique_path is NULL. */
+		link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 		MemoryContextSwitchTo(oldcontext);
 
@@ -1903,7 +1916,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 				pathnode->path.total_cost = subpath->total_cost;
 				pathnode->path.pathkeys = subpath->pathkeys;
 
-				rel->cheapest_unique_path = (Path *) pathnode;
+				link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 				MemoryContextSwitchTo(oldcontext);
 
@@ -1998,7 +2011,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = sort_path.total_cost;
 	}
 
-	rel->cheapest_unique_path = (Path *) pathnode;
+	link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2029,7 +2042,7 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->path.pathtarget = target ? target : rel->reltarget;
@@ -2120,7 +2133,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = NIL;	/* Gather has unordered result */
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->single_copy = false;
 
@@ -2163,7 +2176,7 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_subqueryscan(pathnode, root, rel, pathnode->path.param_info,
 					  trivial_pathtarget);
@@ -2392,7 +2405,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2444,7 +2457,7 @@ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2491,7 +2504,7 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2644,8 +2657,8 @@ create_nestloop_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 
 	final_cost_nestloop(root, pathnode, workspace, extra);
@@ -2708,8 +2721,8 @@ create_mergejoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_mergeclauses = mergeclauses;
 	pathnode->outersortkeys = outersortkeys;
@@ -2785,8 +2798,8 @@ create_hashjoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = NIL;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_hashclauses = hashclauses;
 	/* final_cost_hashjoin will fill in pathnode->num_batches */
@@ -2827,6 +2840,9 @@ create_projection_path(PlannerInfo *root,
 		Assert(subpp->path.parent == rel);
 		subpath = subpp->subpath;
 		Assert(!IsA(subpath, ProjectionPath));
+
+		/* Free it if not used anywhere else. */
+		unlink_path((Path **) &subpp);
 	}
 
 	pathnode->path.pathtype = T_Result;
@@ -2842,7 +2858,7 @@ create_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2961,21 +2977,21 @@ apply_projection_to_path(PlannerInfo *root,
 		{
 			GatherPath *gpath = (GatherPath *) path;
 
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
+			link_path(&gpath->subpath,
+					  (Path *) create_projection_path(root,
+													  gpath->subpath->parent,
+													  gpath->subpath,
+													  target));
 		}
 		else
 		{
 			GatherMergePath *gmpath = (GatherMergePath *) path;
 
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
+			link_path(&gmpath->subpath,
+					  (Path *) create_projection_path(root,
+													  gmpath->subpath->parent,
+													  gmpath->subpath,
+													  target));
 		}
 	}
 	else if (path->parallel_safe &&
@@ -3024,7 +3040,7 @@ create_set_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order XXX? */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Estimate number of rows produced by SRFs for each row of input; if
@@ -3093,7 +3109,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
@@ -3140,7 +3156,7 @@ create_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_sort(&pathnode->path, root, pathkeys,
 			  subpath->total_cost,
@@ -3186,7 +3202,7 @@ create_group_path(PlannerInfo *root,
 	/* Group doesn't change sort ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->groupClause = groupClause;
 	pathnode->qual = qual;
@@ -3244,7 +3260,7 @@ create_upper_unique_path(PlannerInfo *root,
 	/* Unique doesn't change the input ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->numkeys = numCols;
 
 	/*
@@ -3317,7 +3333,7 @@ create_agg_path(PlannerInfo *root,
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->aggstrategy = aggstrategy;
 	pathnode->aggsplit = aggsplit;
@@ -3380,7 +3396,7 @@ create_groupingsets_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Simplify callers by downgrading AGG_SORTED to AGG_PLAIN, and AGG_MIXED
@@ -3630,7 +3646,7 @@ create_windowagg_path(PlannerInfo *root,
 	/* WindowAgg preserves the input sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->winclause = winclause;
 	pathnode->qual = qual;
 	pathnode->topwindow = topwindow;
@@ -3699,7 +3715,7 @@ create_setop_path(PlannerInfo *root,
 	pathnode->path.pathkeys =
 		(strategy == SETOP_SORTED) ? subpath->pathkeys : NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->cmd = cmd;
 	pathnode->strategy = strategy;
 	pathnode->distinctList = distinctList;
@@ -3758,8 +3774,8 @@ create_recursiveunion_path(PlannerInfo *root,
 	/* RecursiveUnion result is always unsorted */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->leftpath = leftpath;
-	pathnode->rightpath = rightpath;
+	link_path(&pathnode->leftpath, leftpath);
+	link_path(&pathnode->rightpath, rightpath);
 	pathnode->distinctList = distinctList;
 	pathnode->wtParam = wtParam;
 	pathnode->numGroups = numGroups;
@@ -3801,7 +3817,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->rowMarks = rowMarks;
 	pathnode->epqParam = epqParam;
 
@@ -3904,7 +3920,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
 		pathnode->path.pathtarget->width = 0;
 	}
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->operation = operation;
 	pathnode->canSetTag = canSetTag;
 	pathnode->nominalRelation = nominalRelation;
@@ -3962,7 +3978,7 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.startup_cost = subpath->startup_cost;
 	pathnode->path.total_cost = subpath->total_cost;
 	pathnode->path.pathkeys = subpath->pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->limitOffset = limitOffset;
 	pathnode->limitCount = limitCount;
 	pathnode->limitOption = limitOption;
@@ -4243,6 +4259,7 @@ do { \
 	(path) = reparameterize_path_by_child(root, (path), child_rel); \
 	if ((path) == NULL) \
 		return NULL; \
+	link_path(&(path), (path)); \
 } while(0)
 
 #define REPARAMETERIZE_CHILD_PATH_LIST(pathlist) \
@@ -4534,11 +4551,14 @@ reparameterize_pathlist_by_child(PlannerInfo *root,
 
 		if (path == NULL)
 		{
+			/* TODO: unlink paths in the list */
 			list_free(result);
 			return NIL;
 		}
 
+		/* TODO: need a routine to link a path into a linked list */
 		result = lappend(result, path);
+		path->ref_count++;
 	}
 
 	return result;
-- 
2.25.1

0003-Local-variables-pointing-to-path-node-used--20240206.patchtext/x-patch; charset=US-ASCII; name=0003-Local-variables-pointing-to-path-node-used--20240206.patchDownload

From 97ba1883e3df294294ea652f684b3503de63f247 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 17 Jul 2023 15:10:45 +0530
Subject: [PATCH 3/3] Local variables pointing to path node used repeatedly

In match_unsorted_outer() we create a materialize path for inner
relation and pass it to try_nestloop_path repeatedly. Link and unlink
the path to and from the local variable to keep track of it.
---
 src/backend/optimizer/path/joinpath.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 6aca66f196..36dffb0af6 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1801,12 +1801,14 @@ match_unsorted_outer(PlannerInfo *root,
 		/*
 		 * Consider materializing the cheapest inner path, unless
 		 * enable_material is off or the path in question materializes its
-		 * output anyway.
+		 * output anyway. Link the path to a local reference so that it won't
+		 * be freed by add_path if the surrounding nest path is freed by
+		 * add_path. It will get freed if not used later.
 		 */
 		if (enable_material && inner_cheapest_total != NULL &&
 			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
-			matpath = (Path *)
-				create_material_path(innerrel, inner_cheapest_total);
+			link_path(&matpath,
+					  (Path *) create_material_path(innerrel, inner_cheapest_total));
 	}
 
 	foreach(lc1, outerrel->pathlist)
@@ -1922,6 +1924,10 @@ match_unsorted_outer(PlannerInfo *root,
 								 false);
 	}
 
+	/* Materialized inner path won't be used anymore. Unlink it */
+	if (matpath)
+		unlink_path(&matpath);
+
 	/*
 	 * Consider partial nestloop and mergejoin plan if outerrel has any
 	 * partial path and the joinrel is parallel-safe.  However, we can't
-- 
2.25.1

#11

Andrei Lepikhov

a.lepikhov@postgrespro.ru

almost 2 years ago

In reply to: Ashutosh Bapat (#10)

Re: Memory consumed by paths during partitionwise join planning

On 6/2/2024 19:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

Let me write words of opinion on that feature.
I generally like the idea of a refcount field. We had problems
generating alternative paths in extensions and designing the Asymmetric
Join feature [1]/messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com when we proposed an alternative path one level below
and called the add_path() routine. We had lost the path in the path list
referenced from the upper RelOptInfo. This approach allows us to make
more handy solutions instead of hacking with a copy/restore pathlist.
But I'm not sure about freeing unreferenced paths. I would have to see
alternatives in the pathlist.
About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.
Also, I am glad to see a positive opinion about the path_walker()
routine. Somewhere else, for example, in [2]/messages/by-id/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com -- regards, Andrei Lepikhov Postgres Professional, it seems we need it too.

[1]: /messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com
/messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com
[2]: /messages/by-id/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com -- regards, Andrei Lepikhov Postgres Professional
/messages/by-id/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com
--
regards,
Andrei Lepikhov
Postgres Professional

#12

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Andrei Lepikhov (#11)

Re: Memory consumed by paths during partitionwise join planning

On Thu, Feb 15, 2024 at 9:41 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 6/2/2024 19:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

Let me write words of opinion on that feature.
I generally like the idea of a refcount field. We had problems
generating alternative paths in extensions and designing the Asymmetric
Join feature [1] when we proposed an alternative path one level below
and called the add_path() routine. We had lost the path in the path list
referenced from the upper RelOptInfo. This approach allows us to make
more handy solutions instead of hacking with a copy/restore pathlist.

Thanks for your comments. I am glad that you find this useful.

But I'm not sure about freeing unreferenced paths. I would have to see
alternatives in the pathlist.

I didn't understand this. Can you please elaborate? A path in any
pathlist is referenced. An unreferenced path should not be in any
pathlist.

About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.

filter or free?

Also, I am glad to see a positive opinion about the path_walker()
routine. Somewhere else, for example, in [2], it seems we need it too.

I agree that we should introduce path_walker() yes.

I am waiting for David's opinion about the performance numbers shared
in [1]postgr.es/m/CAExHW5uDkMQL8SicV3_=AYcsWwMTNR8GzJo310J7sv7TMLoL6Q@mail.gmail.com before working further on the patches and bring them out of WIP
state.

[1]: postgr.es/m/CAExHW5uDkMQL8SicV3_=AYcsWwMTNR8GzJo310J7sv7TMLoL6Q@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

#13

Andrei Lepikhov

a.lepikhov@postgrespro.ru

almost 2 years ago

In reply to: Ashutosh Bapat (#12)

Re: Memory consumed by paths during partitionwise join planning

On 15/2/2024 19:06, Ashutosh Bapat wrote:

On Thu, Feb 15, 2024 at 9:41 AM Andrei Lepikhov

But I'm not sure about freeing unreferenced paths. I would have to see
alternatives in the pathlist.

I didn't understand this. Can you please elaborate? A path in any
pathlist is referenced. An unreferenced path should not be in any
pathlist.

I mean that at some point, an extension can reconsider the path tree
after building the top node of this path. I vaguely recall that we
already have (or had) kind of such optimization in the core where part
of the plan changes after it has been built.
Live example: right now, I am working on the code like MSSQL has - a
combination of NestLoop and HashJoin paths and switching between them in
real-time. It requires both paths in the path list at the moment when
extensions are coming. Even if one of them isn't referenced from the
upper pathlist, it may still be helpful for the extension.

About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.

filter or free?

Filter.
I meant that Postres tries to apply IndexScan, BitmapScan,
IndexOnlyScan, and other strategies, passing throughout the partition
indexes. The optimizer spends a lot of time doing that. So, why not
introduce a symmetrical strategy and give away from the search some
indexes of types of scan based on the pathifying experience of previous
partitions of the same table: if you have dozens of partitions, Is it
beneficial for the system to find a bit more optimal IndexScan on one
partition having SeqScans on 999 other?

--
regards,
Andrei Lepikhov
Postgres Professional

#14

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Andrei Lepikhov (#13)

Re: Memory consumed by paths during partitionwise join planning

On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Live example: right now, I am working on the code like MSSQL has - a
combination of NestLoop and HashJoin paths and switching between them in
real-time. It requires both paths in the path list at the moment when
extensions are coming. Even if one of them isn't referenced from the
upper pathlist, it may still be helpful for the extension.

There is no guarantee that every path presented to add_path will be
preserved. Suboptimal paths are freed as and when add_path discovers
that they are suboptimal. So I don't think an extension can rely on
existence of a path. But having a refcount makes it easy to preserve
the required paths by referencing them.

About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.

filter or free?

Filter.
I meant that Postres tries to apply IndexScan, BitmapScan,
IndexOnlyScan, and other strategies, passing throughout the partition
indexes. The optimizer spends a lot of time doing that. So, why not
introduce a symmetrical strategy and give away from the search some
indexes of types of scan based on the pathifying experience of previous
partitions of the same table: if you have dozens of partitions, Is it
beneficial for the system to find a bit more optimal IndexScan on one
partition having SeqScans on 999 other?

IIUC, you are suggesting that instead of planning each
partition/partitionwise join, we only create paths with the strategies
which were found to be optimal with previous partitions. That's a good
heuristic but it won't work if partition properties - statistics,
indexes etc. differ between groups of partitions.

--
Best Wishes,
Ashutosh Bapat

#15

Andrei Lepikhov

a.lepikhov@postgrespro.ru

almost 2 years ago

In reply to: Ashutosh Bapat (#14)

Re: Memory consumed by paths during partitionwise join planning

On 19/2/2024 19:25, Ashutosh Bapat wrote:

On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Live example: right now, I am working on the code like MSSQL has - a
combination of NestLoop and HashJoin paths and switching between them in
real-time. It requires both paths in the path list at the moment when
extensions are coming. Even if one of them isn't referenced from the
upper pathlist, it may still be helpful for the extension.

There is no guarantee that every path presented to add_path will be
preserved. Suboptimal paths are freed as and when add_path discovers
that they are suboptimal. So I don't think an extension can rely on
existence of a path. But having a refcount makes it easy to preserve
the required paths by referencing them.

I don't insist, just provide my use case. It would be ideal if you would
provide some external routines for extensions that allow for sticking
the path in pathlist even when it has terrible cost estimation.

About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.

filter or free?

Filter.
I meant that Postres tries to apply IndexScan, BitmapScan,
IndexOnlyScan, and other strategies, passing throughout the partition
indexes. The optimizer spends a lot of time doing that. So, why not
introduce a symmetrical strategy and give away from the search some
indexes of types of scan based on the pathifying experience of previous
partitions of the same table: if you have dozens of partitions, Is it
beneficial for the system to find a bit more optimal IndexScan on one
partition having SeqScans on 999 other?

IIUC, you are suggesting that instead of planning each
partition/partitionwise join, we only create paths with the strategies
which were found to be optimal with previous partitions. That's a good
heuristic but it won't work if partition properties - statistics,
indexes etc. differ between groups of partitions.

Sure, but the "Symmetry" strategy assumes that on the scope of a
thousand partitions, especially with parallel append involved, it
doesn't cause sensible performance degradation if we find a bit
suboptimal path in a small subset of partitions. Does it make sense?
As I see, when people use 10-100 partitions for the table, they usually
strive to keep indexes symmetrical for all partitions.

--
regards,
Andrei Lepikhov
Postgres Professional

#16

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Andrei Lepikhov (#15)

Re: Memory consumed by paths during partitionwise join planning

On Tue, Feb 20, 2024 at 8:19 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 19/2/2024 19:25, Ashutosh Bapat wrote:

On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Live example: right now, I am working on the code like MSSQL has - a
combination of NestLoop and HashJoin paths and switching between them in
real-time. It requires both paths in the path list at the moment when
extensions are coming. Even if one of them isn't referenced from the
upper pathlist, it may still be helpful for the extension.

There is no guarantee that every path presented to add_path will be
preserved. Suboptimal paths are freed as and when add_path discovers
that they are suboptimal. So I don't think an extension can rely on
existence of a path. But having a refcount makes it easy to preserve
the required paths by referencing them.

I don't insist, just provide my use case. It would be ideal if you would
provide some external routines for extensions that allow for sticking
the path in pathlist even when it has terrible cost estimation.

With refcounts you can reference it and store it somewhere other than
pathlist. The path won't be lost until it is dereferrenced.
RelOptInfo::Pathlist is for optimal paths.

IIUC, you are suggesting that instead of planning each
partition/partitionwise join, we only create paths with the strategies
which were found to be optimal with previous partitions. That's a good
heuristic but it won't work if partition properties - statistics,
indexes etc. differ between groups of partitions.

Sure, but the "Symmetry" strategy assumes that on the scope of a
thousand partitions, especially with parallel append involved, it
doesn't cause sensible performance degradation if we find a bit
suboptimal path in a small subset of partitions. Does it make sense?
As I see, when people use 10-100 partitions for the table, they usually
strive to keep indexes symmetrical for all partitions.

I agree that we need something like that. In order to do that, we need
machinery to prove that all partitions have similar properties. Once
that is proved, we can skip creating paths for similar partitions. But
that's out of scope of this work and complements it.

--
Best Wishes,
Ashutosh Bapat

#17

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Ashutosh Bapat (#10)

Re: Memory consumed by paths during partitionwise join planning

On 6/2/2024 13:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

That looks pretty small considering the benefits. What do you think?

[1] /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

If you want to experiment, please use attached patches. There's a fix
for segfault during initdb in them. The patches are still raw.

First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

I think this work is promising, especially in the scope of partitioning.
I've analysed how it works by basing these patches on the current
master. You propose freeing unused paths after the end of the standard
path search.
In my view, upper paths generally consume less memory than join and scan
paths. This is primarily due to the limited options provided by Postgres
so far.
At the same time, this technique (while highly useful in general) adds
fragility and increases complexity: a developer needs to remember to
link the path using the pointer in different places of the code.
So, maybe go the alternative way? Invent a subquery memory context and
store all the path allocations there. It can be freed after setrefs
finishes this subquery planning without pulling up this subquery.

--
regards, Andrei Lepikhov

#18

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#17)

Re: Memory consumed by paths during partitionwise join planning

On Thu, Sep 19, 2024 at 4:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 6/2/2024 13:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

That looks pretty small considering the benefits. What do you think?

[1] /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

If you want to experiment, please use attached patches. There's a fix
for segfault during initdb in them. The patches are still raw.

First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

I think this work is promising, especially in the scope of partitioning.
I've analysed how it works by basing these patches on the current
master. You propose freeing unused paths after the end of the standard
path search.
In my view, upper paths generally consume less memory than join and scan
paths. This is primarily due to the limited options provided by Postgres
so far.
At the same time, this technique (while highly useful in general) adds
fragility and increases complexity: a developer needs to remember to
link the path using the pointer in different places of the code.
So, maybe go the alternative way? Invent a subquery memory context and
store all the path allocations there. It can be freed after setrefs
finishes this subquery planning without pulling up this subquery.

Using memory context for subquery won't help with partitioning right?
If the outermost query has a partitioned table with thousands of
partitions, it will still accumulate those paths till the very end of
planning. We could instead use memory context/s to store all the paths
created, then copy the optimal paths into a new memory context at
strategic points and blow up the old memory context. And repeat this
till we choose the final path and create a plan out of it; at that
point we could blow up the memory context containing remaining paths
as well. That will free the paths as soon as they are rendered
useless. I discussed this idea with Alvaro offline. We thought that
this approach needs some code to copy paths and then copying paths
recursively has some overhead of itself. The current work of adding a
reference count, OTOH has potential to bring discipline into the way
we handle paths. We need to avoid risks posed by dangling pointers.
For which Alvaro suggested looking at the way we manage snapshots. But
I didn't get time to explore that idea yet.

--
Best Wishes,
Ashutosh Bapat

#19

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Ashutosh Bapat (#18)

Re: Memory consumed by paths during partitionwise join planning

On 19/9/2024 13:12, Ashutosh Bapat wrote:

On Thu, Sep 19, 2024 at 4:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

At the same time, this technique (while highly useful in general) adds
fragility and increases complexity: a developer needs to remember to
link the path using the pointer in different places of the code.
So, maybe go the alternative way? Invent a subquery memory context and
store all the path allocations there. It can be freed after setrefs
finishes this subquery planning without pulling up this subquery.

Using memory context for subquery won't help with partitioning right?
If the outermost query has a partitioned table with thousands of
partitions, it will still accumulate those paths till the very end of
planning.

I got it. Just haven't had huge tables in the outer before.

We could instead use memory context/s to store all the paths
created, then copy the optimal paths into a new memory context at
strategic points and blow up the old memory context. And repeat this
till we choose the final path and create a plan out of it; at that
point we could blow up the memory context containing remaining paths
as well. That will free the paths as soon as they are rendered
useless.

I think any scalable solution should be based on a per-partition
cleanup. For starters, why not adopt Tom's patch [1]Optimize planner memory consumption for huge arrays /messages/by-id/1367418.1708816059@sss.pgh.pa.us for selectivity
estimations? We will see the profit in the case of long lists of clauses.

I discussed this idea with Alvaro offline. We thought that
this approach needs some code to copy paths and then copying paths
recursively has some overhead of itself.

It needs path_tree_walker at first. We discussed it before but failed.
Maybe design it beforehand and use it in re-parameterising code?

The current work of adding a
reference count, OTOH has potential to bring discipline into the way
we handle paths. We need to avoid risks posed by dangling pointers.

Both hands up for having pointer counters: It is painful all the time in
extensions to invent an approach to safely removing a path you want to
replace with a custom one. I just want to say it looks too dangerous
compared to the value of a possible positive outcome.

For which Alvaro suggested looking at the way we manage snapshots. But
I didn't get time to explore that idea yet.

Unfortunately, I can't understand this idea without an explanation.

[1]: Optimize planner memory consumption for huge arrays /messages/by-id/1367418.1708816059@sss.pgh.pa.us
/messages/by-id/1367418.1708816059@sss.pgh.pa.us

--
regards, Andrei Lepikhov

#20

Andrei Lepikhov

a.lepikhov@postgrespro.ru

7 months ago

In reply to: Ashutosh Bapat (#10)

3 attachment(s)

Re: Memory consumed by paths during partitionwise join planning

On 6/2/2024 13:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

I think this project is quite important to move forward and discover how
far we can go this way. In the attachment, the rebased patch set with
fixes allows it to pass the regression tests.

This idea of a refcounter resolves the problem with blind usage of
add_path. It is not only about extensions, which sometimes want to add
paths on different levels of the join tree and don't care about dangling
pointers. It is also about possible hidden bugs (a freed path staying in
the path list, just not employed) that aren't yet detected due to
costings and low test coverage.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachments:

0001-Basic-infrastructure-to-link-unlink-and-free-pathnod-20250627.patchtext/plain; charset=UTF-8; name=0001-Basic-infrastructure-to-link-unlink-and-free-pathnod-20250627.patchDownload

From 1b2a42da985fbd0287472ba3b348528d8931ba9d Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Thu, 26 Jun 2025 15:11:03 +0200
Subject: [PATCH 1/3] Basic infrastructure to link, unlink, and free pathnodes.

add_path and add_partial_path free path nodes that they deem suboptimal.
But they just free the uppermost pathnode in the path. The subpaths continue
to occupy memory even if they are not used anywhere. The subpath nodes are
not freed because other paths may reference them. This commit introduces
the infrastructure to track references to paths.

As the path nodes are referenced, they are "linked" using link_path() to
reference objects. When the path nodes are no longer helpful, they are
"unlinked" using unlink_path() from the referencing objects. The paths whose
references reach zero during unlinking are freed automatically using
the free_path() function. The function unlinks the sub-path nodes and can be
called when freeing any path node.

When the final path for the join tree is chosen, the paths linked to each
participating relation are unlinked, thus ultimately getting freed if not used.

TODO
The functions free_path(), unlink_path() and link_path() have ereports
to catch code paths that do not use these functions correctly. They call
errbacktrace(), which is not used anywhere else. These ereport calls will
need to be fixed for productization.
---
 src/backend/optimizer/path/allpaths.c |  83 +++++++++++++++
 src/backend/optimizer/util/pathnode.c | 142 ++++++++++++++++++++++++++
 src/include/nodes/pathnodes.h         |   1 +
 src/include/optimizer/pathnode.h      |  38 +++++++
 4 files changed, 264 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..a5d48397ac3 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3418,6 +3418,87 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 	}
 }
 
+/*
+ * TODO: Need similar code to free paths in upper relations.
+ */
+static void
+free_unused_paths_from_rel(RelOptInfo *rel)
+{
+	ListCell   *lc_path;
+	int			cnt_part;
+
+	foreach(lc_path, rel->pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			/* TODO: use routine to unlink path from list */
+			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/* Do the same for partial pathlist. */
+	foreach(lc_path, rel->partial_pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
+			unlink_path(&path);
+		}
+	}
+
+	/*
+	 * TODO: We can perform this in generate_partitionwise_paths as well since
+	 * by that time all the paths from partitions will be linked if used.
+	 */
+
+	/* Free unused paths from the partition relations */
+	for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++)
+	{
+		RelOptInfo *child_rel = rel->part_rels[cnt_part];
+
+		if (child_rel == NULL || IS_DUMMY_REL(child_rel))
+			/* Skip empty entries */
+			continue;
+
+		free_unused_paths_from_rel(child_rel);
+	}
+}
+
+/*
+ * Free unused paths from all the relations created while building the join tree.
+ */
+static void
+free_unused_paths(PlannerInfo *root, int levels_needed)
+{
+	int			cnt;
+
+	for (cnt = levels_needed - 1; cnt >= 1; cnt--)
+	{
+		ListCell   *lc;
+
+		foreach(lc, root->join_rel_level[cnt])
+		{
+			free_unused_paths_from_rel(lfirst(lc));
+		}
+	}
+
+	for (cnt = 0; cnt < root->simple_rel_array_size; cnt++)
+	{
+		RelOptInfo *rel = root->simple_rel_array[cnt];
+
+		/* Skip empty slots */
+		if (rel != NULL)
+			free_unused_paths_from_rel(rel);
+	}
+}
+
 /*
  * standard_join_search
  *	  Find possible joinpaths for a query by successively finding ways
@@ -3520,6 +3601,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		}
 	}
 
+	free_unused_paths(root, levels_needed);
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..fa13a8f8ff1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -969,6 +969,148 @@ add_partial_path_precheck(RelOptInfo *parent_rel, int disabled_nodes,
 	return true;
 }
 
+void
+free_path(Path *path)
+{
+	/*
+	 * If the path is still referenced, freeing it would create a dangling
+	 * pointer.
+	 */
+	/* TODO: it could just be an Assert? */
+	if (path->ref_count != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has reference count %d, can not be freed",
+						path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/*
+	 * A path referenced in the parent relation's pathlist can't be freed.
+	 * Ideally such a path should have ref_count >= 1. If this happens it
+	 * indicates that the path was not linked somewhere, and yet unlinked
+	 * (usually by free_path()).
+	 */
+	if (list_member_ptr(path->parent->pathlist, path))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed",
+						path, path->pathtype, path->ref_count)));
+		return;
+	}
+
+	/* Decrement the reference counts of paths referenced by this one. */
+	switch (path->pathtype)
+	{
+		case T_SeqScan:
+		case T_IndexScan:
+		case T_IndexOnlyScan:
+			/* Simple paths reference no other path. */
+			break;
+
+		case T_MergeJoin:
+		case T_HashJoin:
+		case T_NestLoop:
+			{
+				JoinPath   *jpath = (JoinPath *) path;
+
+				unlink_path(&(jpath->outerjoinpath));
+				unlink_path(&(jpath->innerjoinpath));
+			}
+			break;
+
+		case T_Append:
+		case T_MergeAppend:
+			{
+				AppendPath *appath = (AppendPath *) path;
+				ListCell   *lc;
+
+				foreach(lc, appath->subpaths)
+				{
+					Path	   *subpath = lfirst(lc);
+
+					/* TODO use a routine to unlink path from list. */
+					appath->subpaths = foreach_delete_current(appath->subpaths, lc);
+					unlink_path(&subpath);
+				}
+			}
+			break;
+
+		case T_Gather:
+			{
+				GatherPath *gpath = (GatherPath *) path;
+
+				unlink_path(&(gpath->subpath));
+			}
+			break;
+
+		case T_GatherMerge:
+			{
+				GatherMergePath *gmpath = (GatherMergePath *) path;
+
+				unlink_path(&gmpath->subpath);
+			}
+			break;
+
+		case T_BitmapHeapScan:
+			{
+				BitmapHeapPath *bhpath = (BitmapHeapPath *) path;
+
+				unlink_path(&(bhpath->bitmapqual));
+			}
+			break;
+
+		case T_Material:
+			{
+				MaterialPath *mpath = (MaterialPath *) path;
+
+				unlink_path(&(mpath->subpath));
+			}
+			break;
+
+		case T_Memoize:
+			{
+				MemoizePath *mpath = (MemoizePath *) path;
+
+				unlink_path(&mpath->subpath);
+			}
+			break;
+
+		case T_Result:
+			{
+				/*
+				 * All Result paths except ProjectionPath are simple paths
+				 * without any subpath.
+				 */
+				if (IsA(path, ProjectionPath))
+				{
+					ProjectionPath *ppath = (ProjectionPath *) path;
+
+					unlink_path(&ppath->subpath);
+				}
+			}
+			break;
+
+		default:
+			ereport(WARNING,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errbacktrace(),
+					 errmsg("unrecognized path type %d", path->pathtype)));
+			break;
+	}
+
+	/*
+	 * TODO: add_path does not free IndexPaths, but we do that here. Is there
+	 * a hazard?
+	 */
+
+	/* Now reclaim the memory. */
+	pfree(path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..c88323a915b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1797,6 +1797,7 @@ typedef struct Path
 
 	/* sort ordering of path's output; a List of PathKey nodes; see above */
 	List	   *pathkeys;
+	int			ref_count;
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..22ee1c2261b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -16,6 +16,7 @@
 
 #include "nodes/bitmapset.h"
 #include "nodes/pathnodes.h"
+#include "limits.h"
 
 
 /*
@@ -306,6 +307,43 @@ extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
 										  RelOptInfo *child_rel);
 extern bool path_is_reparameterizable_by_child(Path *path,
 											   RelOptInfo *child_rel);
+extern void free_path(Path *path);
+
+static inline void
+link_path(Path **path_link, Path *path)
+{
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d has negative reference count %d",
+						path, path->pathtype, path->ref_count)));
+
+	path->ref_count++;
+	*path_link = path;
+	Assert(path->ref_count > 0 && path->ref_count <= INT_MAX);
+}
+
+static inline void
+unlink_path(Path **path_link)
+{
+	Path	   *path = *path_link;
+
+	/* A path to be unlinked should have been linked before. */
+	if (path->ref_count < 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errbacktrace(),
+				 errmsg("path node %p of type %d had negative reference count %d",
+						path, path->pathtype, path->ref_count)));
+
+	path->ref_count--;
+	*path_link = NULL;
+
+	/* Free path if none is referencing it. */
+	if (path->ref_count == 0)
+		free_path(path);
+}
 
 /*
  * prototypes for relnode.c
-- 
2.50.0

0002-Actual-code-to-use-pathnode-referencing-infrastructu-20250627.patchtext/plain; charset=UTF-8; name=0002-Actual-code-to-use-pathnode-referencing-infrastructu-20250627.patchDownload

From a376145f3770b0fc2ea9b828e074dc522c1abaf7 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 27 Jun 2025 11:42:06 +0200
Subject: [PATCH 2/3] Actual code to use pathnode referencing infrastructure

The commit uses the infrastructure built by the previous commit to references,
unreference and free paths.

The code is not completely mature. There are TODOs.
---
 src/backend/optimizer/path/allpaths.c |   8 +-
 src/backend/optimizer/util/pathnode.c | 212 +++++++++++++++++---------
 src/include/optimizer/pathnode.h      |   8 +-
 3 files changed, 147 insertions(+), 81 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a5d48397ac3..33b12ed50f9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3432,11 +3432,11 @@ free_unused_paths_from_rel(RelOptInfo *rel)
 		Path	   *path = (Path *) lfirst(lc_path);
 
 		/* Free the path if none references it. */
-		if (path->ref_count == 1)
+		if (path->ref_count == 0)
 		{
 			/* TODO: use routine to unlink path from list */
 			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
-			unlink_path(&path);
+			unlink_path(&path, 0, true, true);
 		}
 	}
 
@@ -3446,10 +3446,10 @@ free_unused_paths_from_rel(RelOptInfo *rel)
 		Path	   *path = (Path *) lfirst(lc_path);
 
 		/* Free the path if none references it. */
-		if (path->ref_count == 1)
+		if (path->ref_count == 0)
 		{
 			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
-			unlink_path(&path);
+			unlink_path(&path, 0, true, true);
 		}
 	}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fa13a8f8ff1..0d820143ba3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -627,10 +627,10 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 														  p1);
 
 			/*
-			 * Delete the data pointed-to by the deleted cell, if possible
+			 * TODO: write a routine to unlink a path from the list node and
+			 * delete the list node
 			 */
-			if (!IsA(old_path, IndexPath))
-				pfree(old_path);
+			unlink_path(&old_path, 0, false, false);
 		}
 		else
 		{
@@ -658,12 +658,14 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place in pathlist */
 		parent_rel->pathlist =
 			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO: write a function to link_path in a List */
 	}
 	else
 	{
 		/* Reject and recycle the new path */
 		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+			free_path(new_path, 0, false);
 	}
 }
 
@@ -876,7 +878,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		{
 			parent_rel->partial_pathlist =
 				foreach_delete_current(parent_rel->partial_pathlist, p1);
-			pfree(old_path);
+			/* TODO use routine to unlink a path from the linked list */
+			unlink_path(&old_path, 0, false, false);
 		}
 		else
 		{
@@ -899,11 +902,13 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 		/* Accept the new path: insert it at proper place */
 		parent_rel->partial_pathlist =
 			list_insert_nth(parent_rel->partial_pathlist, insert_at, new_path);
+		new_path->ref_count++;
+		/* TODO create a routine to link path in a list at a given place */
 	}
 	else
 	{
 		/* Reject and recycle the new path */
-		pfree(new_path);
+		free_path(new_path, 0, false);
 	}
 }
 
@@ -970,8 +975,13 @@ add_partial_path_precheck(RelOptInfo *parent_rel, int disabled_nodes,
 }
 
 void
-free_path(Path *path)
+free_path(Path *path, int level, bool recurse)
 {
+	if (!recurse && level > 0)
+		return;
+
+	level++;
+
 	/*
 	 * If the path is still referenced, freeing it would create a dangling
 	 * pointer.
@@ -979,11 +989,6 @@ free_path(Path *path)
 	/* TODO: it could just be an Assert? */
 	if (path->ref_count != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errbacktrace(),
-				 errmsg("path node %p of type %d has reference count %d, can not be freed",
-						path, path->pathtype, path->ref_count)));
 		return;
 	}
 
@@ -995,11 +1000,6 @@ free_path(Path *path)
 	 */
 	if (list_member_ptr(path->parent->pathlist, path))
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errbacktrace(),
-				 errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed",
-						path, path->pathtype, path->ref_count)));
 		return;
 	}
 
@@ -1018,8 +1018,8 @@ free_path(Path *path)
 			{
 				JoinPath   *jpath = (JoinPath *) path;
 
-				unlink_path(&(jpath->outerjoinpath));
-				unlink_path(&(jpath->innerjoinpath));
+				unlink_path(&(jpath->outerjoinpath), level, recurse, true);
+				unlink_path(&(jpath->innerjoinpath), level, recurse, true);
 			}
 			break;
 
@@ -1035,7 +1035,7 @@ free_path(Path *path)
 
 					/* TODO use a routine to unlink path from list. */
 					appath->subpaths = foreach_delete_current(appath->subpaths, lc);
-					unlink_path(&subpath);
+					unlink_path(&subpath, level, recurse, true);
 				}
 			}
 			break;
@@ -1044,7 +1044,7 @@ free_path(Path *path)
 			{
 				GatherPath *gpath = (GatherPath *) path;
 
-				unlink_path(&(gpath->subpath));
+				unlink_path(&(gpath->subpath), level, recurse, true);
 			}
 			break;
 
@@ -1052,7 +1052,7 @@ free_path(Path *path)
 			{
 				GatherMergePath *gmpath = (GatherMergePath *) path;
 
-				unlink_path(&gmpath->subpath);
+				unlink_path(&gmpath->subpath, level, recurse, true);
 			}
 			break;
 
@@ -1060,7 +1060,7 @@ free_path(Path *path)
 			{
 				BitmapHeapPath *bhpath = (BitmapHeapPath *) path;
 
-				unlink_path(&(bhpath->bitmapqual));
+				unlink_path(&(bhpath->bitmapqual), level, recurse, true);
 			}
 			break;
 
@@ -1068,7 +1068,7 @@ free_path(Path *path)
 			{
 				MaterialPath *mpath = (MaterialPath *) path;
 
-				unlink_path(&(mpath->subpath));
+				unlink_path(&(mpath->subpath), level, recurse, true);
 			}
 			break;
 
@@ -1076,7 +1076,7 @@ free_path(Path *path)
 			{
 				MemoizePath *mpath = (MemoizePath *) path;
 
-				unlink_path(&mpath->subpath);
+				unlink_path(&mpath->subpath, level, recurse, true);
 			}
 			break;
 
@@ -1090,16 +1090,64 @@ free_path(Path *path)
 				{
 					ProjectionPath *ppath = (ProjectionPath *) path;
 
-					unlink_path(&ppath->subpath);
+					unlink_path(&ppath->subpath, level, recurse, true);
 				}
 			}
 			break;
+		case T_Agg:
+		{
+			AggPath *apath = (AggPath *) path;
+
+			unlink_path(&apath->subpath, level, recurse, true);
+		}
+			break;
+		case T_IncrementalSort:
+		{
+			IncrementalSortPath *ipath = (IncrementalSortPath *) path;
+
+			unlink_path(&ipath->spath.subpath, level, recurse, true);
+		}
+			break;
+		case T_WindowAgg:
+		{
+			WindowAggPath *wpath = (WindowAggPath *) path;
+
+			unlink_path(&wpath->subpath, level, recurse, true);
+		}
+			break;
+		case T_SubqueryScan:
+		{
+			SubqueryScanPath *spath = (SubqueryScanPath *) path;
+
+			unlink_path(&spath->subpath, level, recurse, true);
+		}
+			break;
+		case T_Unique:
+		{
+			UniquePath *upath = (UniquePath *) path;
 
+			unlink_path(&upath->subpath, level, recurse, true);
+		}
+			break;
+		case T_Limit:
+		{
+			LimitPath *lpath = (LimitPath *) path;
+
+			unlink_path(&lpath->subpath, level, recurse, true);
+		}
+			break;
+		case T_Group:
+		{
+			GroupPath *gpath = (GroupPath *) path;
+
+			unlink_path(&gpath->subpath, level, recurse, true);
+		}
+			break;
 		default:
-			ereport(WARNING,
+			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					 errbacktrace(),
-					 errmsg("unrecognized path type %d", path->pathtype)));
+					 errmsg("unrecognized path type1 %d", path->pathtype)));
 			break;
 	}
 
@@ -1256,7 +1304,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
 
-	pathnode->bitmapqual = bitmapqual;
+	link_path(&(pathnode->bitmapqual), bitmapqual);
 
 	cost_bitmap_heap_scan(&pathnode->path, root, rel,
 						  pathnode->path.param_info,
@@ -1291,6 +1339,8 @@ create_bitmap_and_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1343,6 +1393,8 @@ create_bitmap_or_path(PlannerInfo *root,
 	{
 		Path	   *bitmapqual = (Path *) lfirst(lc);
 
+		/* TODO: find a better way to link a path in a linked list */
+		bitmapqual->ref_count++;
 		required_outer = bms_add_members(required_outer,
 										 PATH_REQ_OUTER(bitmapqual));
 	}
@@ -1516,6 +1568,8 @@ create_append_path(PlannerInfo *root,
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		/* TODO: we should use the routine to link path to list */
+		subpath->ref_count++;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
@@ -1661,6 +1715,9 @@ create_merge_append_path(PlannerInfo *root,
 		/* All child paths should be unparameterized */
 		Assert(bms_is_empty(PATH_REQ_OUTER(subpath)));
 
+		/* TODO: use routine which links a path into a list */
+		subpath->ref_count++;
+
 		pathnode->path.rows += subpath->rows;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
@@ -1789,7 +1846,7 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_material(&pathnode->path,
 				  subpath->disabled_nodes,
@@ -1824,7 +1881,7 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->hash_operators = hash_operators;
 	pathnode->param_exprs = param_exprs;
 	pathnode->singlerow = singlerow;
@@ -1919,7 +1976,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 
 	/*
 	 * Under GEQO and when planning child joins, the sjinfo might be
@@ -1947,7 +2004,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = subpath->total_cost;
 		pathnode->path.pathkeys = subpath->pathkeys;
 
-		rel->cheapest_unique_path = (Path *) pathnode;
+		/* TODO: Do we need to make sure that cheapest_unique_path is NULL. */
+		link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 		MemoryContextSwitchTo(oldcontext);
 
@@ -1986,7 +2044,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 				pathnode->path.total_cost = subpath->total_cost;
 				pathnode->path.pathkeys = subpath->pathkeys;
 
-				rel->cheapest_unique_path = (Path *) pathnode;
+				link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 				MemoryContextSwitchTo(oldcontext);
 
@@ -2087,7 +2145,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		pathnode->path.total_cost = sort_path.total_cost;
 	}
 
-	rel->cheapest_unique_path = (Path *) pathnode;
+	link_path(&(rel->cheapest_unique_path), (Path *) pathnode);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2129,7 +2187,7 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->path.pathtarget = target ? target : rel->reltarget;
@@ -2200,7 +2258,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = NIL;	/* Gather has unordered result */
 
-	pathnode->subpath = subpath;
+	link_path(&(pathnode->subpath), subpath);
 	pathnode->num_workers = subpath->parallel_workers;
 	pathnode->single_copy = false;
 
@@ -2243,7 +2301,7 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_subqueryscan(pathnode, root, rel, pathnode->path.param_info,
 					  trivial_pathtarget);
@@ -2475,7 +2533,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2529,7 +2587,7 @@ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2578,7 +2636,7 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->fdw_outerpath = fdw_outerpath;
+	link_path(&pathnode->fdw_outerpath, fdw_outerpath);
 	pathnode->fdw_restrictinfo = fdw_restrictinfo;
 	pathnode->fdw_private = fdw_private;
 
@@ -2741,8 +2799,8 @@ create_nestloop_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 
 	final_cost_nestloop(root, pathnode, workspace, extra);
@@ -2807,8 +2865,8 @@ create_mergejoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_mergeclauses = mergeclauses;
 	pathnode->outersortkeys = outersortkeys;
@@ -2885,8 +2943,8 @@ create_hashjoin_path(PlannerInfo *root,
 	pathnode->jpath.path.pathkeys = NIL;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.inner_unique = extra->inner_unique;
-	pathnode->jpath.outerjoinpath = outer_path;
-	pathnode->jpath.innerjoinpath = inner_path;
+	link_path(&(pathnode->jpath.outerjoinpath), outer_path);
+	link_path(&(pathnode->jpath.innerjoinpath), inner_path);
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_hashclauses = hashclauses;
 	/* final_cost_hashjoin will fill in pathnode->num_batches */
@@ -2927,6 +2985,9 @@ create_projection_path(PlannerInfo *root,
 		Assert(subpp->path.parent == rel);
 		subpath = subpp->subpath;
 		Assert(!IsA(subpath, ProjectionPath));
+
+		/* Free it if not used anywhere else. */
+		unlink_path((Path **) &subpp, 0, false, false);
 	}
 
 	pathnode->path.pathtype = T_Result;
@@ -2942,7 +3003,7 @@ create_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -3063,21 +3124,21 @@ apply_projection_to_path(PlannerInfo *root,
 		{
 			GatherPath *gpath = (GatherPath *) path;
 
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
+			link_path(&gpath->subpath,
+					  (Path *) create_projection_path(root,
+													  gpath->subpath->parent,
+													  gpath->subpath,
+													  target));
 		}
 		else
 		{
 			GatherMergePath *gmpath = (GatherMergePath *) path;
 
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
+			link_path(&gmpath->subpath,
+					  (Path *) create_projection_path(root,
+													  gmpath->subpath->parent,
+													  gmpath->subpath,
+													  target));
 		}
 	}
 	else if (path->parallel_safe &&
@@ -3126,7 +3187,7 @@ create_set_projection_path(PlannerInfo *root,
 	/* Projection does not change the sort order XXX? */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Estimate number of rows produced by SRFs for each row of input; if
@@ -3196,7 +3257,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_incremental_sort(&pathnode->path,
 						  root, pathkeys, presorted_keys,
@@ -3244,7 +3305,7 @@ create_sort_path(PlannerInfo *root,
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	cost_sort(&pathnode->path, root, pathkeys,
 			  subpath->disabled_nodes,
@@ -3291,7 +3352,7 @@ create_group_path(PlannerInfo *root,
 	/* Group doesn't change sort ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->groupClause = groupClause;
 	pathnode->qual = qual;
@@ -3350,7 +3411,7 @@ create_upper_unique_path(PlannerInfo *root,
 	/* Unique doesn't change the input ordering */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->numkeys = numCols;
 
 	/*
@@ -3424,7 +3485,7 @@ create_agg_path(PlannerInfo *root,
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	pathnode->aggstrategy = aggstrategy;
 	pathnode->aggsplit = aggsplit;
@@ -3488,7 +3549,7 @@ create_groupingsets_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 
 	/*
 	 * Simplify callers by downgrading AGG_SORTED to AGG_PLAIN, and AGG_MIXED
@@ -3746,7 +3807,7 @@ create_windowagg_path(PlannerInfo *root,
 	/* WindowAgg preserves the input sort order */
 	pathnode->path.pathkeys = subpath->pathkeys;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->winclause = winclause;
 	pathnode->qual = qual;
 	pathnode->runCondition = runCondition;
@@ -3818,8 +3879,9 @@ create_setop_path(PlannerInfo *root,
 	pathnode->path.pathkeys =
 		(strategy == SETOP_SORTED) ? leftpath->pathkeys : NIL;
 
-	pathnode->leftpath = leftpath;
-	pathnode->rightpath = rightpath;
+	link_path(&pathnode->leftpath, leftpath);
+	link_path(&pathnode->rightpath, rightpath);
+
 	pathnode->cmd = cmd;
 	pathnode->strategy = strategy;
 	pathnode->groupList = groupList;
@@ -3934,8 +3996,8 @@ create_recursiveunion_path(PlannerInfo *root,
 	/* RecursiveUnion result is always unsorted */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->leftpath = leftpath;
-	pathnode->rightpath = rightpath;
+	link_path(&pathnode->leftpath, leftpath);
+	link_path(&pathnode->rightpath, rightpath);
 	pathnode->distinctList = distinctList;
 	pathnode->wtParam = wtParam;
 	pathnode->numGroups = numGroups;
@@ -3977,7 +4039,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	pathnode->path.pathkeys = NIL;
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->rowMarks = rowMarks;
 	pathnode->epqParam = epqParam;
 
@@ -4084,7 +4146,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
 		pathnode->path.pathtarget->width = 0;
 	}
 
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->operation = operation;
 	pathnode->canSetTag = canSetTag;
 	pathnode->nominalRelation = nominalRelation;
@@ -4144,7 +4206,7 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.startup_cost = subpath->startup_cost;
 	pathnode->path.total_cost = subpath->total_cost;
 	pathnode->path.pathkeys = subpath->pathkeys;
-	pathnode->subpath = subpath;
+	link_path(&pathnode->subpath, subpath);
 	pathnode->limitOffset = limitOffset;
 	pathnode->limitCount = limitCount;
 	pathnode->limitOption = limitOption;
@@ -4430,6 +4492,7 @@ do { \
 	(path) = reparameterize_path_by_child(root, (path), child_rel); \
 	if ((path) == NULL) \
 		return NULL; \
+	link_path(&(path), (path)); \
 } while(0)
 
 #define REPARAMETERIZE_CHILD_PATH_LIST(pathlist) \
@@ -4852,11 +4915,14 @@ reparameterize_pathlist_by_child(PlannerInfo *root,
 
 		if (path == NULL)
 		{
+			/* TODO: unlink paths in the list */
 			list_free(result);
 			return NIL;
 		}
 
+		/* TODO: need a routine to link a path into a linked list */
 		result = lappend(result, path);
+		path->ref_count++;
 	}
 
 	return result;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 22ee1c2261b..da37771bf86 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -307,7 +307,7 @@ extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
 										  RelOptInfo *child_rel);
 extern bool path_is_reparameterizable_by_child(Path *path,
 											   RelOptInfo *child_rel);
-extern void free_path(Path *path);
+extern void free_path(Path *path, int level, bool recurse);
 
 static inline void
 link_path(Path **path_link, Path *path)
@@ -325,7 +325,7 @@ link_path(Path **path_link, Path *path)
 }
 
 static inline void
-unlink_path(Path **path_link)
+unlink_path(Path **path_link, int level, bool recurse, bool need_free)
 {
 	Path	   *path = *path_link;
 
@@ -341,8 +341,8 @@ unlink_path(Path **path_link)
 	*path_link = NULL;
 
 	/* Free path if none is referencing it. */
-	if (path->ref_count == 0)
-		free_path(path);
+	if (path->ref_count == 0 && need_free)
+		free_path(path, level, recurse);
 }
 
 /*
-- 
2.50.0

0003-Local-variables-pointing-to-path-node-used-repeatedl-20250627.patchtext/plain; charset=UTF-8; name=0003-Local-variables-pointing-to-path-node-used-repeatedl-20250627.patchDownload

From 0f5a5bd14ce2791f24f855f8176b470a0ba8884d Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 27 Jun 2025 11:46:16 +0200
Subject: [PATCH 3/3] Local variables pointing to path node used repeatedly

In match_unsorted_outer() we create a materialize path for inner
relation and pass it to try_nestloop_path repeatedly. Link and unlink
the path to and from the local variable to keep track of it.
---
 src/backend/optimizer/path/joinpath.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 7aa8f5d799c..275fe9cfb58 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1892,12 +1892,14 @@ match_unsorted_outer(PlannerInfo *root,
 		/*
 		 * Consider materializing the cheapest inner path, unless
 		 * enable_material is off or the path in question materializes its
-		 * output anyway.
+		 * output anyway. Link the path to a local reference so that it won't
+		 * be freed by add_path if the surrounding nest path is freed by
+		 * add_path. It will get freed if not used later.
 		 */
 		if (enable_material && inner_cheapest_total != NULL &&
 			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
-			matpath = (Path *)
-				create_material_path(innerrel, inner_cheapest_total);
+			link_path(&matpath,
+					  (Path *) create_material_path(innerrel, inner_cheapest_total));
 	}
 
 	foreach(lc1, outerrel->pathlist)
@@ -2013,6 +2015,10 @@ match_unsorted_outer(PlannerInfo *root,
 								 false);
 	}
 
+	/* Materialized inner path won't be used anymore. Unlink it */
+	if (matpath)
+		unlink_path(&matpath, 0, false, true);
+
 	/*
 	 * Consider partial nestloop and mergejoin plan if outerrel has any
 	 * partial path and the joinrel is parallel-safe.  However, we can't
-- 
2.50.0

#21

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#20)

Re: Memory consumed by paths during partitionwise join planning

On 27/6/2025 12:01, Andrei Lepikhov wrote:

On 6/2/2024 13:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

I think this project is quite important to move forward and discover how
far we can go this way. In the attachment, the rebased patch set with
fixes allows it to pass the regression tests.

This idea of a refcounter resolves the problem with blind usage of
add_path. It is not only about extensions, which sometimes want to add
paths on different levels of the join tree and don't care about dangling
pointers. It is also about possible hidden bugs (a freed path staying in
the path list, just not employed) that aren't yet detected due to
costings and low test coverage.

After further consideration, I believe the main issue lies in managing
increments and decrements of a path refcounter, especially in light of
ongoing code changes. Additionally, recursion presents another challenge.

To address this complexity, I propose the following solutions:
1. Localise reference management within the add_path function.
2. Transition from a 'strict' approach, which removes all unused paths,
to a more straightforward 'pinning' method. In this approach, we would
simply mark a path as 'used' when someone who was added to the upper
path list references it. Removing less memory, we leave the code much
simpler.

This strategy would help us avoid recursion and dereferencing, which
significantly complicate the code.

--
regards, Andrei Lepikhov

#22

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#21)

Re: Memory consumed by paths during partitionwise join planning

On Mon, Jul 7, 2025 at 8:43 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 27/6/2025 12:01, Andrei Lepikhov wrote:

On 6/2/2024 13:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

I think this project is quite important to move forward and discover how
far we can go this way. In the attachment, the rebased patch set with
fixes allows it to pass the regression tests.

+ case T_Group:
+ {
+ GroupPath *gpath = (GroupPath *) path;
+
+ unlink_path(&gpath->subpath, level, recurse, true);
+ }
+ break;

Thanks for adding those path types. Should we just create a path
walker routine like expression_walker?

That might be one reason why all the regression tests pass, but
there's another. We aren't freeing as many paths are we used to
/*
- * Delete the data pointed-to by the deleted cell, if possible
+ * TODO: write a routine to unlink a path from the list node and
+ * delete the list node
*/
- if (!IsA(old_path, IndexPath))
- pfree(old_path);
+ unlink_path(&old_path, 0, false, false);

If need_free = false, unlink_path will not free the path. Without this change we
were freeing at least the non-index paths but with this change, we aren't
freeing even those? Why so? In all the add_path variants we are doing this. So
effectively we are not freeing any paths that do not appear in the
rel->pathlist. So probably the planner is consuming more memory than earlier.

if (!IsA(new_path, IndexPath))
- pfree(new_path);
+ free_path(new_path, 0, false);

Why don't we free the subpaths if they aren't referenced anymore?

This idea of a refcounter resolves the problem with blind usage of
add_path. It is not only about extensions, which sometimes want to add
paths on different levels of the join tree and don't care about dangling
pointers. It is also about possible hidden bugs (a freed path staying in
the path list, just not employed) that aren't yet detected due to
costings and low test coverage.

Yeah. I started this project with a goal of reducing memory consumed
by planner when there are many partitions involved. But I think it
will be more useful for managing paths.

After further consideration, I believe the main issue lies in managing
increments and decrements of a path refcounter, especially in light of
ongoing code changes. Additionally, recursion presents another challenge.

To address this complexity, I propose the following solutions:
1. Localise reference management within the add_path function.
2. Transition from a 'strict' approach, which removes all unused paths,
to a more straightforward 'pinning' method. In this approach, we would
simply mark a path as 'used' when someone who was added to the upper
path list references it. Removing less memory, we leave the code much
simpler.

Yes. This was one of the ideas Tom had proposed earlier in another
thread to manage paths better and avoid dangling pointers. May be it's
worth starting with that first. Get rid of special handling of index
paths and then improve the memory situation. However, even with that,
I think we should free more paths than less.

--
Best Wishes,
Ashutosh Bapat

#23

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Ashutosh Bapat (#22)

Re: Memory consumed by paths during partitionwise join planning

On 18/7/2025 13:48, Ashutosh Bapat wrote:

On Mon, Jul 7, 2025 at 8:43 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
if (!IsA(new_path, IndexPath))
- pfree(new_path);
+ free_path(new_path, 0, false);

Why don't we free the subpaths if they aren't referenced anymore?

During testing, I discovered that we sometimes encounter unusual cases.
For example, imagine pathlist:
{SeqScan, IndexOnlyScan1, IndexScan2}
Someone may decide that Sort+SeqScan may be cheaper than IndexScan2. Or,
IncrementalSort+IndexOnlyScan1 is cheaper than IndexScan2 ... And add
this path to the same pathlist. I am unsure which exact combinations may
arise in the planner, but without strict rules, someone may forget to
increment the refcounter.

To address this complexity, I propose the following solutions:
1. Localise reference management within the add_path function.
2. Transition from a 'strict' approach, which removes all unused paths,
to a more straightforward 'pinning' method. In this approach, we would
simply mark a path as 'used' when someone who was added to the upper
path list references it. Removing less memory, we leave the code much
simpler.

Yes. This was one of the ideas Tom had proposed earlier in another
thread to manage paths better and avoid dangling pointers. May be it's
worth starting with that first. Get rid of special handling of index
paths and then improve the memory situation. However, even with that,
I think we should free more paths than less.

It seems like one more trade-off: more eager cleaning means more
resources spent.

P.S. path_walker makes sense to implement.

--
regards, Andrei Lepikhov

#24

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#23)

Re: Memory consumed by paths during partitionwise join planning

On Tue, Jul 22, 2025 at 7:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 18/7/2025 13:48, Ashutosh Bapat wrote:
On Mon, Jul 7, 2025 at 8:43 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
if (!IsA(new_path, IndexPath))
- pfree(new_path);
+ free_path(new_path, 0, false);
Why don't we free the subpaths if they aren't referenced anymore?
During testing, I discovered that we sometimes encounter unusual cases.
For example, imagine pathlist:
{SeqScan, IndexOnlyScan1, IndexScan2}
Someone may decide that Sort+SeqScan may be cheaper than IndexScan2. Or,
IncrementalSort+IndexOnlyScan1 is cheaper than IndexScan2 ... And add
this path to the same pathlist. I am unsure which exact combinations may
arise in the planner, but without strict rules, someone may forget to
increment the refcounter.

When sort + seqscan path is created, seqscan's refcount will be
incremented, and if that dominates indexScan2 which in turn is removed
by add_path from pathlist, indexScan2 should be freed if no other path
is referencing it. One of the objectives of this patch is to make it
easy to maintain refcount. That's why the refcount is incremented in a
central place like path creation time or when adding to pathlist. The
case where it is not incremented should be studied and fixed.

To address this complexity, I propose the following solutions:
1. Localise reference management within the add_path function.
2. Transition from a 'strict' approach, which removes all unused paths,
to a more straightforward 'pinning' method. In this approach, we would
simply mark a path as 'used' when someone who was added to the upper
path list references it. Removing less memory, we leave the code much
simpler.

Yes. This was one of the ideas Tom had proposed earlier in another
thread to manage paths better and avoid dangling pointers. May be it's
worth starting with that first. Get rid of special handling of index
paths and then improve the memory situation. However, even with that,
I think we should free more paths than less.

It seems like one more trade-off: more eager cleaning means more
resources spent.

Sorry, I wasn't clear. Even when using simple method to keep track of
whether a path has been referenced or not, we should be able to get
rid of special handling of IndexPath in add_path*
/* Reject and recycle the new path */
if (!IsA(new_path, IndexPath))
pfree(new_path);

That way, we will be able to free even the unreferenced index paths,
thus saving more memory than now.

P.S. path_walker makes sense to implement.

--
Best Wishes,
Ashutosh Bapat