Hybrid Hash/Nested Loop joins and caching results from subplans

dgrowleyml@gmail.com

over 5 years ago

In reply to: Simon Riggs (#2)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 21 May 2020 at 00:56, Simon Riggs <simon@2ndquadrant.com> wrote:

I thought the main reason to do this was the case when the nested loop subplan was significantly underestimated and we realize during execution that we should have built a hash table. So including this based on cost alone seems to miss a trick.

Isn't that mostly because the planner tends to choose a
non-parameterized nested loop when it thinks the outer side of the
join has just 1 row? If so, I'd say that's a separate problem as
Result Cache only deals with parameterized nested loops. Perhaps the
problem you mention could be fixed by adding some "uncertainty degree"
to the selectivity estimate function and have it return that along
with the selectivity. We'd likely not want to choose an
unparameterized nested loop when the uncertainly level is high.
Multiplying the selectivity of different selectivity estimates could
raise the uncertainty level a magnitude.

For plans where the planner chooses to use a non-parameterized nested
loop due to having just 1 row on the outer side of the loop, it's
taking a huge risk. The cost of putting the 1 row on the inner side of
a hash join would bearly cost anything extra during execution.
Hashing 1 row is pretty cheap and performing a lookup on that hashed
row is not much more expensive than evaluating the qual of the nested
loop. Really just requires the additional hash function calls. Having
the uncertainty degree I mentioned above would allow us to only have
the planner do that when the uncertainty degree indicates it's not
worth the risk.

David

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Simon Riggs (#2)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

My question is whether it should be added as an optional facility of a
parameterised sub plan, rather than an always-needed full-strength node.
That way the choice of whether to use it can happen at execution time once
we notice that we've been called too many times.

Actually I am not sure about what does the "parameterized sub plan" mean (I
treat is a SubPlan Node), so please correct me if I misunderstand you:)
Because
the inner plan in nest loop not a SubPlan node actually. so if bind the
facility to SubPlan node, we may loss the chances for nest loop. And when
we
consider the usage for nest loop, we can consider the below example, where
this
feature will be more powerful.

select j1o.i, j2_v.sum_5
from j1 j1o
inner join lateral
(select im100, sum(im5) as sum_5
from j2
where j1o.im100 = im100
and j1o.i = 1
group by im100) j2_v
on true
where j1o.i = 1;

--
Best Regards
Andy Fan

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#4)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 22 May 2020 at 12:12, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Actually I am not sure about what does the "parameterized sub plan" mean (I
treat is a SubPlan Node), so please correct me if I misunderstand you:) Because
the inner plan in nest loop not a SubPlan node actually. so if bind the
facility to SubPlan node, we may loss the chances for nest loop.

A parameterized subplan would be a subquery that contains column
reference to a query above its own level. The executor changes that
column reference into a parameter and the subquery will need to be
rescanned each time the parameter's value changes.

And when we
consider the usage for nest loop, we can consider the below example, where this
feature will be more powerful.

I didn't quite get the LATERAL support quite done in the version I
sent. For now, I'm not considering adding a Result Cache node if there
are lateral vars in any location other than the inner side of the
nested loop join. I think it'll just be a few lines to make it work
though. I wanted to get some feedback before going to too much more
trouble to make all cases work.

I've now added this patch to the first commitfest of PG14.

David

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#5)

2 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Today I tested the correctness & performance of this patch based on TPC-H
workload, the environment is setup based on [1]https://ankane.org/tpc-h. Correctness is tested by
storing the result into another table when this feature is not introduced
and
then enable this feature and comparing the result with the original ones. No
issue is found at this stage.

I also checked the performance gain for TPC-H workload, totally 4 out of
the 22
queries uses this new path, 3 of them are subplan, 1 of them is nestloop.
All of
changes gets a better result. You can check the attachments for reference.
normal.log is the data without this feature, patched.log is the data with
the
feature. The data doesn't show the 10x performance gain, I think that's
mainly
data size related.

At the code level, I mainly checked nestloop path and
cost_resultcache_rescan,
everything looks good to me. I'd like to check the other parts in the
following days.

[1]: https://ankane.org/tpc-h

--
Best Regards
Andy Fan

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#6)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 2 Jun 2020 at 21:05, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Today I tested the correctness & performance of this patch based on TPC-H
workload, the environment is setup based on [1]. Correctness is tested by
storing the result into another table when this feature is not introduced and
then enable this feature and comparing the result with the original ones. No
issue is found at this stage.

Thank you for testing it out.

I also checked the performance gain for TPC-H workload, totally 4 out of the 22
queries uses this new path, 3 of them are subplan, 1 of them is nestloop. All of
changes gets a better result. You can check the attachments for reference.
normal.log is the data without this feature, patched.log is the data with the
feature. The data doesn't show the 10x performance gain, I think that's mainly
data size related.

Thanks for running those tests. I had a quick look at the results and
I think to say that all 4 are better is not quite right. One is
actually a tiny bit slower and one is only faster due to a plan
change. Here's my full analysis.

Q2 uses a result cache for the subplan and has about a 37.5% hit ratio
which reduces the execution time of the query down to 67% of the
original.
Q17 uses a result cache for the subplan and has about a 96.5% hit
ratio which reduces the execution time of the query down to 24% of the
original time.
Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
ratio. The execution time is reduced to 91% of the original time only
because the planner uses a different plan, which just happens to be
faster by chance.
Q20 uses a result cache for the subplan and has a 0% hit ratio. The
execution time is 100.27% of the original time. There are 8620 cache
misses.
All other queries use the same plan with and without the patch.

At the code level, I mainly checked nestloop path and cost_resultcache_rescan,
everything looks good to me. I'd like to check the other parts in the following days.

Great.

Show quoted text

[1] https://ankane.org/tpc-h

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#7)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for running those tests. I had a quick look at the results and
I think to say that all 4 are better is not quite right. One is
actually a tiny bit slower and one is only faster due to a plan
change.

Yes.. Thanks for pointing it out.

Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
ratio. The execution time is reduced to 91% of the original time only
because the planner uses a different plan, which just happens to be
faster by chance.
Q20 uses a result cache for the subplan and has a 0% hit ratio. The
execution time is 100.27% of the original time. There are 8620 cache
misses.

Looks the case here is some statistics issue or cost model issue. I'd
like to check more about that. But before that, I upload the steps[1]https://github.com/zhihuiFan/tpch-postgres I
used
in case you want to reproduce it locally.

[1]: https://github.com/zhihuiFan/tpch-postgres

--
Best Regards
Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#8)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, Jun 3, 2020 at 10:36 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Thanks for running those tests. I had a quick look at the results and
I think to say that all 4 are better is not quite right. One is
actually a tiny bit slower and one is only faster due to a plan
change.

Yes.. Thanks for pointing it out.

Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
ratio. The execution time is reduced to 91% of the original time only
because the planner uses a different plan, which just happens to be
faster by chance.

This case should be caused by wrong rows estimations on condition
o_orderkey in (select l_orderkey from lineitem group by l_orderkey having
sum(l_quantity) > 312). The estimation is 123766 rows, but the fact is 10
rows.
This estimation is hard and I don't think we should address this issue on
this
patch.

Q20 uses a result cache for the subplan and has a 0% hit ratio. The

execution time is 100.27% of the original time. There are 8620 cache
misses.

This is by design for current implementation.

For subplans, since we plan subplans before we're done planning the
outer plan, there's very little information to go on about the number
of times that the cache will be looked up. For now, I've coded things
so the cache is always used for EXPR_SUBLINK type subplans. "

I first tried to see if we can have a row estimation before the subplan
is created and it looks very complex. The subplan was created during
preprocess_qual_conditions, at that time, we even didn't create the base
RelOptInfo , to say nothing of join_rel which the rows estimation happens
much later.

Then I see if we can delay the cache decision until we have the rows
estimation,
ExecInitSubPlan may be a candidate. At this time we can't add a new
ResutCache node, but we can add a cache function to SubPlan node with costed
based. However the num_of_distinct values for parameterized variable can't
be
calculated which I still leave it as an open issue.

--
Best Regards
Andy Fan

#10

[1]: /messages/by-id/1992952.1592785225@sss.pgh.pa.us

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#9)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 12 Jun 2020 at 16:10, Andy Fan <zhihui.fan1213@gmail.com> wrote:

I first tried to see if we can have a row estimation before the subplan
is created and it looks very complex. The subplan was created during
preprocess_qual_conditions, at that time, we even didn't create the base
RelOptInfo , to say nothing of join_rel which the rows estimation happens
much later.

Then I see if we can delay the cache decision until we have the rows estimation,
ExecInitSubPlan may be a candidate. At this time we can't add a new
ResutCache node, but we can add a cache function to SubPlan node with costed
based. However the num_of_distinct values for parameterized variable can't be
calculated which I still leave it as an open issue.

I don't really like the idea of stuffing this feature into some
existing node type. Doing so would seem pretty magical when looking
at an EXPLAIN ANALYZE. There is of course overhead to pulling tuples
through an additional node in the plan, but if you use that as an
argument then there's some room to argue that we should only have 1
executor node type to get rid of that overhead.

Tom mentioned in [1]/messages/by-id/1992952.1592785225@sss.pgh.pa.us that he's reconsidering his original thoughts on
leaving the AlternativeSubPlan selection decision until execution
time. If that were done late in planning, as Tom mentioned, then it
would be possible to give a more accurate cost to the Result Cache as
we'd have built the outer plan by that time and would be able to
estimate the number of distinct calls to the correlated subplan. As
that feature is today we'd be unable to delay making the decision
until execution time as we don't have the required details to know how
many distinct calls there will be to the Result Cache node.

For now, I'm planning on changing things around a little in the Result
Cache node to allow faster deletions from the cache. As of now, we
must perform 2 hash lookups to perform a single delete. This is
because we must perform the lookup to fetch the entry from the MRU
list key, then an additional lookup in the hash delete code. I plan
on changing the hash delete code to expose another function that
allows us to delete an item directly if we've already looked it up.
This should make a small reduction in the overheads of the node.
Perhaps if the overhead is very small (say < 1%) when the cache is of
no use then it might not be such a bad thing to just have a Result
Cache for correlated subplans regardless of estimates. With the TPCH
Q20 test, it appeared as if the overhead was 0.27% for that particular
subplan. A more simple subplan would execute more quickly resulting
the Result Cache overhead being a more significant portion of the
overall subquery execution. I'd need to perform a worst-case overhead
test to get an indication of what the percentage is.

David

#11

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#10)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 30 Jun 2020 at 11:57, David Rowley <dgrowleyml@gmail.com> wrote:

For now, I'm planning on changing things around a little in the Result
Cache node to allow faster deletions from the cache. As of now, we
must perform 2 hash lookups to perform a single delete. This is
because we must perform the lookup to fetch the entry from the MRU
list key, then an additional lookup in the hash delete code. I plan
on changing the hash delete code to expose another function that
allows us to delete an item directly if we've already looked it up.
This should make a small reduction in the overheads of the node.
Perhaps if the overhead is very small (say < 1%) when the cache is of
no use then it might not be such a bad thing to just have a Result
Cache for correlated subplans regardless of estimates. With the TPCH
Q20 test, it appeared as if the overhead was 0.27% for that particular
subplan. A more simple subplan would execute more quickly resulting
the Result Cache overhead being a more significant portion of the
overall subquery execution. I'd need to perform a worst-case overhead
test to get an indication of what the percentage is.

I made the changes that I mention to speedup the cache deletes. The
patch is now in 3 parts. The first two parts are additional work and
the final part is the existing work with some small tweaks.

0001: Alters estimate_num_groups() to allow it to pass back a flags
variable to indicate if the estimate used DEFAULT_NUM_DISTINCT. The
idea here is to try and avoid using a Result Cache for a Nested Loop
join when the statistics are likely to be unreliable. Because
DEFAULT_NUM_DISTINCT is 200, if we estimate that number of distinct
values then a Result Cache is likely to look highly favourable in some
situations where it very well may not be. I've not given this patch a
huge amount of thought, but so far I don't see anything too
unreasonable about it. I'm prepared to be wrong about that though.

0002 Makes some adjustments to simplehash.h to expose a function which
allows direct deletion of a hash table element when we already have a
pointer to the bucket. I think this is a pretty good change as it
reuses more simplehash.h code than without the patch.

0003 Is the result cache code. I've done another pass over this
version and fixed a few typos and added a few comments. I've not yet
added support for LATERAL joins. I plan to do that soon. For now, I
just wanted to get something out there as I saw that the patch did
need rebased.

I did end up testing the overheads of having a Result Cache node on a
very simple subplan that'll never see a cache hit. The overhead is
quite a bit more than the 0.27% that we saw with TPCH Q20.

Using a query that gets zero cache hits:

$ cat bench.sql
select relname,(select oid from pg_class c2 where c1.oid = c2.oid)
from pg_Class c1 offset 1000000000;

enable_resultcache = on:

$ pgbench -n -f bench.sql -T 60 postgres
latency average = 0.474 ms
tps = 2110.431529 (including connections establishing)
tps = 2110.503284 (excluding connections establishing)

enable_resultcache = off:

$ pgbench -n -f bench.sql -T 60 postgres
latency average = 0.379 ms
tps = 2640.534303 (including connections establishing)
tps = 2640.620552 (excluding connections establishing)

Which is about a 25% overhead in this very simple case. With more
complex subqueries that overhead will drop significantly, but for that
simple one, it does seem a quite a bit too high to be adding a Result
Cache unconditionally for all correlated subqueries. I think based on
that it's worth looking into the AlternativeSubPlan option that I
mentioned earlier.

I've attached the v2 patch series.

David

Attachments:

v2-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v2-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From 098455e466fd72ef2aa6d2b13aaf9fa2dca96581 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v2 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..70f6fa2493 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2960,7 +2960,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4ff3c7a2fd..97758dc41c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1865,7 +1865,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..ca3132d9b7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2073,6 +2073,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4131019fc9..de30550bef 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4777,7 +4779,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..7e9df9461e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e845a4b1ae..37d6d293c3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1656,6 +1656,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..2c5bfaf628 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3238,6 +3238,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3284,6 +3285,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3308,6 +3310,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3355,7 +3362,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3363,6 +3370,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3566,6 +3577,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.25.1

v2-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v2-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From 945382ad2486b5a2dff6a164482129ee3bbcea70 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v2 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 115 +++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 54 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 90dfa8a695..051119b290 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -79,6 +79,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -763,75 +764,81 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
- * present.
+ * Delete 'entry' from hash table.
  */
-SH_SCOPE bool
-SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 {
-	uint32		hash = SH_HASH_KEY(tb, key);
-	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
-	uint32		curelem = startelem;
-
-	while (true)
-	{
-		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
-
-		if (entry->status == SH_STATUS_EMPTY)
-			return false;
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		curelem;
+	uint32		startelem;
 
-		if (entry->status == SH_STATUS_IN_USE &&
-			SH_COMPARE_KEYS(tb, hash, key, entry))
-		{
-			SH_ELEMENT_TYPE *lastentry = entry;
+	Assert(entry >= &tb->data[0] && entry < &tb->data[tb->size]);
 
-			tb->members--;
+	/* Calculate the index of 'entry' */
+	startelem = curelem = entry - &tb->data[0];
 
-			/*
-			 * Backward shift following elements till either an empty element
-			 * or an element at its optimal position is encountered.
-			 *
-			 * While that sounds expensive, the average chain length is short,
-			 * and deletions would otherwise require tombstones.
-			 */
-			while (true)
-			{
-				SH_ELEMENT_TYPE *curentry;
-				uint32		curhash;
-				uint32		curoptimal;
+	tb->members--;
 
-				curelem = SH_NEXT(tb, curelem, startelem);
-				curentry = &tb->data[curelem];
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
 
-				if (curentry->status != SH_STATUS_IN_USE)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
 
-				curhash = SH_ENTRY_HASH(tb, curentry);
-				curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				/* current is at optimal position, done */
-				if (curoptimal == curelem)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
 
-				/* shift */
-				memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				lastentry = curentry;
-			}
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
 
-			return true;
-		}
+		lastentry = curentry;
+	}
+}
 
-		/* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+	SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);
 
-		curelem = SH_NEXT(tb, curelem, startelem);
+	if (likely(entry != NULL))
+	{
+		/*
+		 * Perform deletion and also the relocation of subsequent items which
+		 * are not in their optimal position but can now be moved up.
+		 */
+		SH_DELETE_ITEM(tb, entry);
+		return true;
 	}
+
+	return false;		/* Can't find 'key' */
 }
 
 /*
-- 
2.25.1

v2-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v2-0003-Add-Result-Cache-executor-node.patchDownload

From 6eead275645e6fae2ae75db1e5091cdc77fe1568 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v2 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   28 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  112 ++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1060 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   29 +
 src/backend/nodes/outfuncs.c                  |   34 +
 src/backend/nodes/readfuncs.c                 |   21 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  127 ++
 src/backend/optimizer/path/joinpath.c         |  374 +++++-
 src/backend/optimizer/plan/createplan.c       |   81 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   62 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   29 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   64 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   16 +
 src/include/nodes/plannodes.h                 |   18 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |   51 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  242 ++--
 src/test/regress/expected/resultcache.out     |  100 ++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   24 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   32 +
 47 files changed, 2783 insertions(+), 229 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..a5d697bd7a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2914,10 +2916,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2933,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2944,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..00b3567e0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b81aab239f..7e17b1f13d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4637,6 +4637,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 093864cfc0..10a4fa83b6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1279,6 +1281,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1970,6 +1975,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3043,6 +3052,109 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		int			n;
+
+		for (n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..f32876f412 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collactions: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..d4c50c261d 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *)planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 5662e7d742..7f76394851 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -309,6 +310,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Sort:
 			result = (PlanState *) ExecInitSort((Sort *) node,
 												estate, eflags);
@@ -695,6 +701,10 @@ ExecEndNode(PlanState *node)
 			ExecEndMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecEndSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..396d2aee18
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1060 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *-------------------------------------------------------------------------
+ */
+ /*
+  * INTERFACE ROUTINES
+  *		ExecResultCache			- materialize the result of a subplan
+  *		ExecInitResultCache		- initialize node and subnodes
+  *		ExecEndResultCache		- shutdown node and subnodes
+  *		ExecReScanResultCache	- rescan the result cache
+  */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+ * ResultCacheTuple
+ * Stores an individually cached tuple
+ */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;			/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node		lru_node;	/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;			/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32			status;			/* Hash status */
+	uint32			hash;			/* Hash value (cached) */
+	uint64			entry_mem;		/* Bytes of memory used by this entry */
+	bool			complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key);
+static int ResultCacheHash_equal(struct resultcache_hash *tb,
+								 const ResultCacheKey *params1,
+								 const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot	 *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid			*collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])			/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+								  collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate)
+{
+	/* XXX should the planner decide on the bucket count? */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, 1024,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int				numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple   *tuple = entry->tuplehead;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	/* Update memory accounting for this entry and the entire cache */
+	rcstate->mem_used -= entry->entry_mem;
+	entry->entry_mem = EMPTY_ENTRY_MEMORY_BYTES(entry);
+	rcstate->mem_used += entry->entry_mem;
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey	   *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	rcstate->mem_used -= entry->entry_mem;
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool					specialkey_intact = true;		/* for now */
+	dlist_mutable_iter		iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey   *key = dlist_container(ResultCacheKey, lru_node,
+												iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this
+		 * LRU entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set spaceOK to
+		 * false to inform the caller the specialkey entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey		   *key;
+	ResultCacheEntry	   *entry;
+	MemoryContext			oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/* Move existing entry to the tail of the LRU list */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Mark the number of bytes which are used by this entry */
+	entry->entry_mem = EMPTY_ENTRY_MEMORY_BYTES(entry);
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += entry->entry_mem;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple	   *tuple;
+	ResultCacheEntry	   *entry = rcstate->entry;
+	MemoryContext			oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	entry->entry_mem += CACHE_TUPLE_BYTES(tuple);
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+		rcstate->last_tuple = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+		rcstate->last_tuple = tuple;
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1; /* stats update */
+
+					/* Fetch the first cached tuple, if there is one */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* No tuples in this cache entry. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1; /* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to
+						 * failure to free enough cache space, so ensure we
+						 * don't do anything here that assumes it worked.
+						 * There's no need to go into bypass mode here as
+						 * we're setting rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+						!cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1; /* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				ResultCacheEntry	*entry = node->entry;
+				Assert(entry != NULL);
+
+				/* Skip to the next tuple to output. */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.
+					 * XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;			/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	} /* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												   &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations;	/* Just point directly to the plan data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the
+	 * main process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+					+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d8cf87e6d0..449fd93542 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,32 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4937,6 +4963,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..ab433854bf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,20 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1922,20 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3837,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4074,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..49ab438dbc 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,25 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2851,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d984da25d7..72b0aa6b2e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4073,6 +4073,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 97758dc41c..40b9d1b576 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -133,6 +133,7 @@ bool		enable_hashagg = true;
 bool		hashagg_avoid_disk_plan = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2297,6 +2298,127 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines and returns the estimated cost of using a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		scan_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/* set the number of bytes each cache entry should consume in the cache */
+	scan_bytes = relation_byte_size(tuples, width);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = work_mem_bytes / scan_bytes;
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4022,6 +4144,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..53f259fa55 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,162 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows,
+ *		including looking for volatile functions in the inner side of the
+ *		join.  Also, fetch outer side exprs and check for valid hashable
+ *		equality operator for each outer expr.  Returns true and sets the
+ *		'param_exprs' and 'operators' output parameters if the caching is
+ *		possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	List	   *clauses = param_info->ppi_clauses;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	Assert(list_length(clauses) > 0);
+
+	foreach(lc, clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		OpExpr	   *opexpr;
+		TypeCacheEntry *typentry;
+		Node	   *expr;
+
+		opexpr = (OpExpr *) rinfo->clause;
+
+		/* ppi_clauses should always meet this requirement */
+		if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+			!clause_sides_match_join(rinfo, outerrel, innerrel))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		if (rinfo->outer_is_left)
+			expr = (Node *) list_nth(opexpr->args, 0);
+		else
+			expr = (Node *) list_nth(opexpr->args, 1);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/* We can only have a result cache when there's some kind of cache key */
+	if (inner_path->param_info == NULL ||
+		inner_path->param_info->ppi_clauses == NIL)
+		return NULL;
+
+	/*
+	 * We can't use a result cache when a lateral join var is required from
+	 * somewhere else other than the inner side of the join.
+	 *
+	 * XXX maybe we can just include lateral_vars from above this in the
+	 * result cache's keys?  Not today though. It seems likely to reduce cache
+	 * hits which may make it very seldom worthwhile.
+	 */
+	if (!bms_is_subset(innerrel->lateral_relids, innerrel->relids))
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -376,6 +543,8 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	Path	   *inner_cache_path;
+	bool		added_path = false;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -458,12 +627,92 @@ try_nestloop_path(PlannerInfo *root,
 									  extra->restrictlist,
 									  pathkeys,
 									  required_outer));
+		added_path = true;
+	}
+
+	/*
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
+	 */
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+	{
+		if (!added_path)
+			bms_free(required_outer);
+		return;
+	}
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+
+				if (!added_path)
+					bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  extra,
+									  outer_path,
+									  inner_cache_path,
+									  extra->restrictlist,
+									  pathkeys,
+									  required_outer));
+		added_path = true;
 	}
 	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
+	}
+
+	if (!added_path)
 	{
 		/* Waste no memory when we reject a path here */
 		bms_free(required_outer);
 	}
+
 }
 
 /*
@@ -481,6 +730,9 @@ try_partial_nestloop_path(PlannerInfo *root,
 						  JoinPathExtraData *extra)
 {
 	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Path	   *inner_cache_path;
 
 	/*
 	 * If the inner path is parameterized, the parameterization must be fully
@@ -492,7 +744,6 @@ try_partial_nestloop_path(PlannerInfo *root,
 	if (inner_path->param_info != NULL)
 	{
 		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
-		RelOptInfo *outerrel = outer_path->parent;
 		Relids		outerrelids;
 
 		/*
@@ -511,41 +762,114 @@ try_partial_nestloop_path(PlannerInfo *root,
 
 	/*
 	 * Before creating a path, get a quick lower bound on what it is likely to
-	 * cost.  Bail out right away if it looks terrible.
+	 * cost.  Don't bother if it looks terrible.
 	 */
 	initial_cost_nestloop(root, &workspace, jointype,
 						  outer_path, inner_path, extra);
-	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
-		return;
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
+
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+				return;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
 
 	/*
-	 * If the inner path is parameterized, it is parameterized by the topmost
-	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
 	 */
-	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-	{
-		inner_path = reparameterize_path_by_child(root, inner_path,
-												  outer_path->parent);
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
 
+	if (inner_cache_path == NULL)
+		return;
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
 		/*
-		 * If we could not translate the path, we can't create nest loop path.
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
 		 */
-		if (!inner_path)
-			return;
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+				return;
+			}
+			else
+				inner_cache_path = reparameterize_path;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_cache_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
+	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
 	}
 
-	/* Might be good enough to be worth trying, so let's try it. */
-	add_partial_path(joinrel, (Path *)
-					 create_nestloop_path(root,
-										  joinrel,
-										  jointype,
-										  &workspace,
-										  extra,
-										  outer_path,
-										  inner_path,
-										  extra->restrictlist,
-										  pathkeys,
-										  NULL));
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eb9543f6ad..fc0e75d0d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,9 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +450,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1527,54 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6359,6 +6418,26 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6947,6 +7026,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6992,6 +7072,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..13d1af1df1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -677,6 +677,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index b02fcb9bfe..16f45f38b3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -135,6 +136,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -232,6 +301,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2684,6 +2787,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 37d6d293c3..31c4a1bb72 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1519,6 +1519,48 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3816,6 +3858,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4053,6 +4106,15 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
 		case T_UniquePath:
 			{
 				UniquePath *upath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 75fc6f11d6..42c1d400e2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1021,6 +1021,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3a25287a39..481e1b6005 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -356,6 +356,7 @@
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
+#enable_resultcache = on
 #enable_mergejoin = on
 #enable_nestloop = on
 #enable_parallel_append = on
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7deeac662..3a3a24941d 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..e9c0c0cfd8
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f5dfa32d55..90a114142e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1982,6 +1983,69 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of times we've skipped the subnode
+								 * scan due to tuples already being cached */
+	uint64		cache_misses;	/* number of times we've had to scan the
+								 * subnode to fetch tuples */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* limit the size of the cache to this (bytes) */
+	uint64		mem_lowerlimit; /* reduce memory usage below this when we free
+								 * up space */
+	MemoryContext tableContext; /* memory context for actual cache */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..f83d6a71b1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,22 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..30a4f58a41 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,24 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 92e70ec0d9..ab4f24648f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -58,6 +58,7 @@ extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool hashagg_avoid_disk_plan;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3bd184ae29..bdc8f3c742 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -950,12 +950,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2523,6 +2525,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2538,6 +2541,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 03ada654bb..d78be811d9 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -742,19 +742,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..d5a8eba085 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4890,14 +4898,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 4315e8e0a3..acee21c08e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2145,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2182,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                          explain_parallel_append                                           
+------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2220,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..a231c080f8
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,100 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.twenty
+           Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 4622  Cache Misses: 5378 Cache Evictions: 4851  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=5378)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=5378)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on tenk1 t1 (actual rows=3333 loops=3)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(12 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=10000 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=10000 loops=1)
+         ->  Result Cache (actual rows=1 loops=10000)
+               Cache Key: t2.twenty
+               Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(9 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+ count |        avg         
+-------+--------------------
+ 10000 | 9.5000000000000000
+(1 row)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..0d2b3c5c10 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1167,9 +1171,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f146..9993bca2fd 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -844,19 +844,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 01b7786f01..331767c4dd 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -87,10 +87,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 044d515507..2eac836e76 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1076,9 +1076,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6658455a74..bc923ae873 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..ecf857c7f6
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,32 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
-- 
2.25.1

#12

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#11)

7 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 2 Jul 2020 at 22:57, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached the v2 patch series.

There was a bug in v2 that caused the caching not to work properly
when a unique join skipped to the next outer row after finding the
first match. The cache was not correctly marked as complete in that
case. Normally we only mark the cache entry complete when we read the
scan to completion. Unique joins is a special case where we can mark
it as complete early.

I've also made a few more changes to reduce the size of the
ResultCacheEntry struct, taking it from 40 bytes down to 24. That
matters quite a bit when the cached tuple is very narrow. One of the
tests in resultcache.out, because we can now fit more entries in the
cache, it reports a 15% increase in cache hits.

I also improved the costing regarding the estimate of how many cache
entries we could fit in work mem. Previously I was not accounting for
the size of the cache data structures in memory. v2 only accounted for
the tuples themselves. It's important to count these as if we don't
then it could cause the costing to think we could fit more entries
than we actually could which meant the estimated number of cache
evictions was off and could result in preferring a result cache plan
when we perhaps shouldn't.

I've attached v4.

I've also attached a bunch of benchmark results which were based on v3
of the patch. I didn't send out v3, but the results of v4 should be
almost the same for this test. The script to run the benchmark is
contained in the resultcachebench.txt file. The benchmark just mocks
up a "parts" table and a "sales" table. The parts table has 1 million
rows in the 1 million test, as does the sales table. This goes up to
10 million and 100 million in the other two tests. What varies with
each bar in the chart is the number of distinct parts in the sales
table. I just started with 1 part then doubled that up to ~1 million.
The unpatched version always uses a Hash Join, which is wasteful since
only a subset of parts are looked up. In the 1 million test the
planner switches to using a Hash Join in the patched version at 65k
parts. It waits until the 1 million distinct parts test to switch
over in the 10 million and 100 million test. The hash join costs are
higher in that case due to multi-batching, which is why the crossover
point is higher on the larger scale tests. I used 256MB work_mem for
all tests. Looking closely at the 10 million test, you can see that
the hash join starts taking longer from 128 parts onward. The hash
table is the same each time here, so I can only suspect that the
slowdown between 64 and 128 parts is due to CPU cache thrashing when
getting the correct buckets from the overly large hash table. This is
not really visible in the patched version as the resultcache hash
table is much smaller.

David

Attachments:

resultcachebench.txttext/plain; charset=US-ASCII; name=resultcachebench.txtDownload

v4-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v4-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From a2d586a7fec1fe9e1055b9534630af57746e90a1 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v4 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 115 +++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 54 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 90dfa8a695..051119b290 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -79,6 +79,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -763,75 +764,81 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
- * present.
+ * Delete 'entry' from hash table.
  */
-SH_SCOPE bool
-SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 {
-	uint32		hash = SH_HASH_KEY(tb, key);
-	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
-	uint32		curelem = startelem;
-
-	while (true)
-	{
-		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
-
-		if (entry->status == SH_STATUS_EMPTY)
-			return false;
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		curelem;
+	uint32		startelem;
 
-		if (entry->status == SH_STATUS_IN_USE &&
-			SH_COMPARE_KEYS(tb, hash, key, entry))
-		{
-			SH_ELEMENT_TYPE *lastentry = entry;
+	Assert(entry >= &tb->data[0] && entry < &tb->data[tb->size]);
 
-			tb->members--;
+	/* Calculate the index of 'entry' */
+	startelem = curelem = entry - &tb->data[0];
 
-			/*
-			 * Backward shift following elements till either an empty element
-			 * or an element at its optimal position is encountered.
-			 *
-			 * While that sounds expensive, the average chain length is short,
-			 * and deletions would otherwise require tombstones.
-			 */
-			while (true)
-			{
-				SH_ELEMENT_TYPE *curentry;
-				uint32		curhash;
-				uint32		curoptimal;
+	tb->members--;
 
-				curelem = SH_NEXT(tb, curelem, startelem);
-				curentry = &tb->data[curelem];
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
 
-				if (curentry->status != SH_STATUS_IN_USE)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
 
-				curhash = SH_ENTRY_HASH(tb, curentry);
-				curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				/* current is at optimal position, done */
-				if (curoptimal == curelem)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
 
-				/* shift */
-				memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				lastentry = curentry;
-			}
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
 
-			return true;
-		}
+		lastentry = curentry;
+	}
+}
 
-		/* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+	SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);
 
-		curelem = SH_NEXT(tb, curelem, startelem);
+	if (likely(entry != NULL))
+	{
+		/*
+		 * Perform deletion and also the relocation of subsequent items which
+		 * are not in their optimal position but can now be moved up.
+		 */
+		SH_DELETE_ITEM(tb, entry);
+		return true;
 	}
+
+	return false;		/* Can't find 'key' */
 }
 
 /*
-- 
2.25.1

v4-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v4-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From bc07feecf1b356a03a42003fa9e806802119d08d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v4 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..70f6fa2493 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2960,7 +2960,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4ff3c7a2fd..97758dc41c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1865,7 +1865,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..ca3132d9b7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2073,6 +2073,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4131019fc9..de30550bef 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4777,7 +4779,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..7e9df9461e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e845a4b1ae..37d6d293c3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1656,6 +1656,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..2c5bfaf628 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3238,6 +3238,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3284,6 +3285,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3308,6 +3310,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3355,7 +3362,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3363,6 +3370,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3566,6 +3577,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.25.1

v4-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v4-0003-Add-Result-Cache-executor-node.patchDownload

From 3097a49f5f97034143f260015c2ec1add7d7d596 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v4 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   28 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  112 ++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1110 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  149 +++
 src/backend/optimizer/path/joinpath.c         |  374 +++++-
 src/backend/optimizer/plan/createplan.c       |   86 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   69 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   64 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   20 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |   51 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  242 ++--
 src/test/regress/expected/resultcache.out     |  100 ++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   24 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   32 +
 47 files changed, 2877 insertions(+), 229 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..a5d697bd7a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2914,10 +2916,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2933,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2944,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..00b3567e0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b81aab239f..7e17b1f13d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4637,6 +4637,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 093864cfc0..10a4fa83b6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1279,6 +1281,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1970,6 +1975,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3043,6 +3052,109 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		int			n;
+
+		for (n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..f32876f412 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collactions: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..d4c50c261d 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *)planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 5662e7d742..7f76394851 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -309,6 +310,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Sort:
 			result = (PlanState *) ExecInitSort((Sort *) node,
 												estate, eflags);
@@ -695,6 +701,10 @@ ExecEndNode(PlanState *node)
 			ExecEndMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecEndSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..3752387ef4
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1110 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *-------------------------------------------------------------------------
+ */
+ /*
+  * INTERFACE ROUTINES
+  *		ExecResultCache			- materialize the result of a subplan
+  *		ExecInitResultCache		- initialize node and subnodes
+  *		ExecEndResultCache		- shutdown node and subnodes
+  *		ExecReScanResultCache	- rescan the result cache
+  */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+ * ResultCacheTuple
+ * Stores an individually cached tuple
+ */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;			/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node		lru_node;	/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;			/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32			hash;			/* Hash value (cached) */
+	char			status;			/* Hash status */
+	bool			complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int ResultCacheHash_equal(struct resultcache_hash *tb,
+								 const ResultCacheKey *params1,
+								 const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot	 *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid			*collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])			/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+								  collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int				numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple   *tuple = entry->tuplehead;
+	uint64				freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey	   *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int i, count;
+		uint64 mem = 0;
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+
+				ResultCacheTuple   *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool					specialkey_intact = true;		/* for now */
+	dlist_mutable_iter		iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey   *key = dlist_container(ResultCacheKey, lru_node,
+												iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this
+		 * LRU entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set spaceOK to
+		 * false to inform the caller the specialkey entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey		   *key;
+	ResultCacheEntry	   *entry;
+	MemoryContext			oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/* Move existing entry to the tail of the LRU list */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple	   *tuple;
+	ResultCacheEntry	   *entry = rcstate->entry;
+	MemoryContext			oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+		rcstate->last_tuple = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+		rcstate->last_tuple = tuple;
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1; /* stats update */
+
+					/* Fetch the first cached tuple, if there is one */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* No tuples in this cache entry. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1; /* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to
+						 * failure to free enough cache space, so ensure we
+						 * don't do anything here that assumes it worked.
+						 * There's no need to go into bypass mode here as
+						 * we're setting rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+						!cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1; /* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				ResultCacheEntry	*entry = node->entry;
+				Assert(entry != NULL);
+
+				/* Skip to the next tuple to output. */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.
+					 * XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;			/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	} /* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												   &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations;	/* Just point directly to the plan data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the planner failed to come up with a better value.
+	 */
+	build_hash_table(rcstate, node->est_entries > 0 ? node->est_entries :
+					 1024);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the
+	 * main process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		   sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+					+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d8cf87e6d0..db0b75e252 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4937,6 +4964,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..d747d90d6f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT64_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4076,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..d5931b1651 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d984da25d7..72b0aa6b2e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4073,6 +4073,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 97758dc41c..120e82eb6b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -133,6 +134,7 @@ bool		enable_hashagg = true;
 bool		hashagg_avoid_disk_plan = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2297,6 +2299,148 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines and returns the estimated cost of using a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+					  ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor will
+	 * use.  If we leave this at zero the executor will just choose the size
+	 * itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4022,6 +4166,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..53f259fa55 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,162 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows,
+ *		including looking for volatile functions in the inner side of the
+ *		join.  Also, fetch outer side exprs and check for valid hashable
+ *		equality operator for each outer expr.  Returns true and sets the
+ *		'param_exprs' and 'operators' output parameters if the caching is
+ *		possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	List	   *clauses = param_info->ppi_clauses;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	Assert(list_length(clauses) > 0);
+
+	foreach(lc, clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		OpExpr	   *opexpr;
+		TypeCacheEntry *typentry;
+		Node	   *expr;
+
+		opexpr = (OpExpr *) rinfo->clause;
+
+		/* ppi_clauses should always meet this requirement */
+		if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+			!clause_sides_match_join(rinfo, outerrel, innerrel))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		if (rinfo->outer_is_left)
+			expr = (Node *) list_nth(opexpr->args, 0);
+		else
+			expr = (Node *) list_nth(opexpr->args, 1);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/* We can only have a result cache when there's some kind of cache key */
+	if (inner_path->param_info == NULL ||
+		inner_path->param_info->ppi_clauses == NIL)
+		return NULL;
+
+	/*
+	 * We can't use a result cache when a lateral join var is required from
+	 * somewhere else other than the inner side of the join.
+	 *
+	 * XXX maybe we can just include lateral_vars from above this in the
+	 * result cache's keys?  Not today though. It seems likely to reduce cache
+	 * hits which may make it very seldom worthwhile.
+	 */
+	if (!bms_is_subset(innerrel->lateral_relids, innerrel->relids))
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -376,6 +543,8 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	Path	   *inner_cache_path;
+	bool		added_path = false;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -458,12 +627,92 @@ try_nestloop_path(PlannerInfo *root,
 									  extra->restrictlist,
 									  pathkeys,
 									  required_outer));
+		added_path = true;
+	}
+
+	/*
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
+	 */
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+	{
+		if (!added_path)
+			bms_free(required_outer);
+		return;
+	}
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+
+				if (!added_path)
+					bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  extra,
+									  outer_path,
+									  inner_cache_path,
+									  extra->restrictlist,
+									  pathkeys,
+									  required_outer));
+		added_path = true;
 	}
 	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
+	}
+
+	if (!added_path)
 	{
 		/* Waste no memory when we reject a path here */
 		bms_free(required_outer);
 	}
+
 }
 
 /*
@@ -481,6 +730,9 @@ try_partial_nestloop_path(PlannerInfo *root,
 						  JoinPathExtraData *extra)
 {
 	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Path	   *inner_cache_path;
 
 	/*
 	 * If the inner path is parameterized, the parameterization must be fully
@@ -492,7 +744,6 @@ try_partial_nestloop_path(PlannerInfo *root,
 	if (inner_path->param_info != NULL)
 	{
 		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
-		RelOptInfo *outerrel = outer_path->parent;
 		Relids		outerrelids;
 
 		/*
@@ -511,41 +762,114 @@ try_partial_nestloop_path(PlannerInfo *root,
 
 	/*
 	 * Before creating a path, get a quick lower bound on what it is likely to
-	 * cost.  Bail out right away if it looks terrible.
+	 * cost.  Don't bother if it looks terrible.
 	 */
 	initial_cost_nestloop(root, &workspace, jointype,
 						  outer_path, inner_path, extra);
-	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
-		return;
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
+
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+				return;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
 
 	/*
-	 * If the inner path is parameterized, it is parameterized by the topmost
-	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
 	 */
-	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-	{
-		inner_path = reparameterize_path_by_child(root, inner_path,
-												  outer_path->parent);
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
 
+	if (inner_cache_path == NULL)
+		return;
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
 		/*
-		 * If we could not translate the path, we can't create nest loop path.
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
 		 */
-		if (!inner_path)
-			return;
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+				return;
+			}
+			else
+				inner_cache_path = reparameterize_path;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_cache_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
+	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
 	}
 
-	/* Might be good enough to be worth trying, so let's try it. */
-	add_partial_path(joinrel, (Path *)
-					 create_nestloop_path(root,
-										  joinrel,
-										  jointype,
-										  &workspace,
-										  extra,
-										  outer_path,
-										  inner_path,
-										  extra->restrictlist,
-										  pathkeys,
-										  NULL));
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eb9543f6ad..05223a835c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1529,55 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6359,6 +6421,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6947,6 +7031,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6992,6 +7077,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..13d1af1df1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -677,6 +677,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index b02fcb9bfe..16f45f38b3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -135,6 +136,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -232,6 +301,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2684,6 +2787,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 37d6d293c3..4f29b5b4e2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1519,6 +1519,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3816,6 +3865,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4053,6 +4113,15 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
 		case T_UniquePath:
 			{
 				UniquePath *upath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 75fc6f11d6..42c1d400e2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1021,6 +1021,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3a25287a39..481e1b6005 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -356,6 +356,7 @@
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
+#enable_resultcache = on
 #enable_mergejoin = on
 #enable_nestloop = on
 #enable_parallel_append = on
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7deeac662..3a3a24941d 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f5dfa32d55..90a114142e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1982,6 +1983,69 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of times we've skipped the subnode
+								 * scan due to tuples already being cached */
+	uint64		cache_misses;	/* number of times we've had to scan the
+								 * subnode to fetch tuples */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* limit the size of the cache to this (bytes) */
+	uint64		mem_lowerlimit; /* reduce memory usage below this when we free
+								 * up space */
+	MemoryContext tableContext; /* memory context for actual cache */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..671fbe81e8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold, or 0 if unknown
+								 */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..0512f1ae1c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,26 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 92e70ec0d9..ab4f24648f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -58,6 +58,7 @@ extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool hashagg_avoid_disk_plan;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3bd184ae29..bdc8f3c742 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -950,12 +950,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2523,6 +2525,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2538,6 +2541,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 03ada654bb..d78be811d9 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -742,19 +742,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..d5a8eba085 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4890,14 +4898,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 4315e8e0a3..acee21c08e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2145,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2182,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                          explain_parallel_append                                           
+------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2220,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..3a920c083a
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,100 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.twenty
+           Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 5339  Cache Misses: 4661 Cache Evictions: 4056  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=4661)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=4661)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on tenk1 t1 (actual rows=3333 loops=3)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(12 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=10000 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=10000 loops=1)
+         ->  Result Cache (actual rows=1 loops=10000)
+               Cache Key: t2.twenty
+               Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(9 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+ count |        avg         
+-------+--------------------
+ 10000 | 9.5000000000000000
+(1 row)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..0d2b3c5c10 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1167,9 +1171,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f146..9993bca2fd 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -844,19 +844,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 01b7786f01..331767c4dd 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -87,10 +87,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 044d515507..2eac836e76 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1076,9 +1076,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6658455a74..bc923ae873 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..ecf857c7f6
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,32 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
-- 
2.25.1

rc_1million_bench.pngimage/png; name=rc_1million_bench.pngDownload

rc_10million_bench.pngimage/png; name=rc_10million_bench.pngDownload

rc_100million_bench.pngimage/png; name=rc_100million_bench.pngDownload

#13

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#12)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 8 Jul 2020 at 00:32, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached v4.

Thomas pointed out to me earlier that, per the CFbot, v4 was
generating a new compiler warning. Andres pointed out to me that I
could fix the warnings of the unused functions in simplehash.h by
changing the scope from static to static inline.

The attached v5 patch set fixes that.

David

Attachments:

v5-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v5-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From d8ff0aa0da854905c01e9f35ba7bc2abf6b495e4 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v5 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..70f6fa2493 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2960,7 +2960,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 87c9b49ce1..9a403a64d6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1865,7 +1865,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..ca3132d9b7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2073,6 +2073,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 14f3fd44e3..3d3cf431df 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4777,7 +4779,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..7e9df9461e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e845a4b1ae..37d6d293c3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1656,6 +1656,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..2c5bfaf628 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3238,6 +3238,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3284,6 +3285,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3308,6 +3310,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3355,7 +3362,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3363,6 +3370,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3566,6 +3577,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.25.1

v5-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v5-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From 8e5c88965df0953b0233ace87fdbc67cc6c211c1 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v5 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 116 +++++++++++++++++++----------------
 1 file changed, 62 insertions(+), 54 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 90dfa8a695..8c74c467ac 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -79,6 +79,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -163,6 +164,7 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
 SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
@@ -763,75 +765,81 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
- * present.
+ * Delete 'entry' from hash table.
  */
-SH_SCOPE bool
-SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 {
-	uint32		hash = SH_HASH_KEY(tb, key);
-	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
-	uint32		curelem = startelem;
-
-	while (true)
-	{
-		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
-
-		if (entry->status == SH_STATUS_EMPTY)
-			return false;
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		curelem;
+	uint32		startelem;
 
-		if (entry->status == SH_STATUS_IN_USE &&
-			SH_COMPARE_KEYS(tb, hash, key, entry))
-		{
-			SH_ELEMENT_TYPE *lastentry = entry;
+	Assert(entry >= &tb->data[0] && entry < &tb->data[tb->size]);
 
-			tb->members--;
+	/* Calculate the index of 'entry' */
+	startelem = curelem = entry - &tb->data[0];
 
-			/*
-			 * Backward shift following elements till either an empty element
-			 * or an element at its optimal position is encountered.
-			 *
-			 * While that sounds expensive, the average chain length is short,
-			 * and deletions would otherwise require tombstones.
-			 */
-			while (true)
-			{
-				SH_ELEMENT_TYPE *curentry;
-				uint32		curhash;
-				uint32		curoptimal;
+	tb->members--;
 
-				curelem = SH_NEXT(tb, curelem, startelem);
-				curentry = &tb->data[curelem];
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
 
-				if (curentry->status != SH_STATUS_IN_USE)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
 
-				curhash = SH_ENTRY_HASH(tb, curentry);
-				curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				/* current is at optimal position, done */
-				if (curoptimal == curelem)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
 
-				/* shift */
-				memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				lastentry = curentry;
-			}
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
 
-			return true;
-		}
+		lastentry = curentry;
+	}
+}
 
-		/* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+	SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);
 
-		curelem = SH_NEXT(tb, curelem, startelem);
+	if (likely(entry != NULL))
+	{
+		/*
+		 * Perform deletion and also the relocation of subsequent items which
+		 * are not in their optimal position but can now be moved up.
+		 */
+		SH_DELETE_ITEM(tb, entry);
+		return true;
 	}
+
+	return false;		/* Can't find 'key' */
 }
 
 /*
-- 
2.25.1

v5-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v5-0003-Add-Result-Cache-executor-node.patchDownload

From b8596a56095e822303c6f606bd42a7275a5d019c Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v5 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   28 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  112 ++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1111 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  149 +++
 src/backend/optimizer/path/joinpath.c         |  374 +++++-
 src/backend/optimizer/plan/createplan.c       |   86 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   69 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   64 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   20 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |   51 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  242 ++--
 src/test/regress/expected/resultcache.out     |  100 ++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   24 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   32 +
 47 files changed, 2878 insertions(+), 229 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..a5d697bd7a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2914,10 +2916,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2933,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2944,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..00b3567e0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 02909b1e66..b65090ec35 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4637,6 +4637,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 093864cfc0..10a4fa83b6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1279,6 +1281,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1970,6 +1975,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3043,6 +3052,109 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		int			n;
+
+		for (n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..f32876f412 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collactions: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..d4c50c261d 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *)planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 5662e7d742..7f76394851 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -309,6 +310,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Sort:
 			result = (PlanState *) ExecInitSort((Sort *) node,
 												estate, eflags);
@@ -695,6 +701,10 @@ ExecEndNode(PlanState *node)
 			ExecEndMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecEndSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..82d33e1b78
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1111 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *-------------------------------------------------------------------------
+ */
+ /*
+  * INTERFACE ROUTINES
+  *		ExecResultCache			- materialize the result of a subplan
+  *		ExecInitResultCache		- initialize node and subnodes
+  *		ExecEndResultCache		- shutdown node and subnodes
+  *		ExecReScanResultCache	- rescan the result cache
+  */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+ * ResultCacheTuple
+ * Stores an individually cached tuple
+ */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;			/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node		lru_node;	/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;			/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32			hash;			/* Hash value (cached) */
+	char			status;			/* Hash status */
+	bool			complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int ResultCacheHash_equal(struct resultcache_hash *tb,
+								 const ResultCacheKey *params1,
+								 const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot	 *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid			*collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])			/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+								  collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int				numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple   *tuple = entry->tuplehead;
+	uint64				freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey	   *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int i, count;
+		uint64 mem = 0;
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+
+				ResultCacheTuple   *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool					specialkey_intact = true;		/* for now */
+	dlist_mutable_iter		iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey   *key = dlist_container(ResultCacheKey, lru_node,
+												iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this
+		 * LRU entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set spaceOK to
+		 * false to inform the caller the specialkey entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey		   *key;
+	ResultCacheEntry	   *entry;
+	MemoryContext			oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/* Move existing entry to the tail of the LRU list */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple	   *tuple;
+	ResultCacheEntry	   *entry = rcstate->entry;
+	MemoryContext			oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+		rcstate->last_tuple = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+		rcstate->last_tuple = tuple;
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1; /* stats update */
+
+					/* Fetch the first cached tuple, if there is one */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* No tuples in this cache entry. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1; /* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to
+						 * failure to free enough cache space, so ensure we
+						 * don't do anything here that assumes it worked.
+						 * There's no need to go into bypass mode here as
+						 * we're setting rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+						!cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1; /* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output. */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.
+					 * XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;			/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	} /* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												   &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations;	/* Just point directly to the plan data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the planner failed to come up with a better value.
+	 */
+	build_hash_table(rcstate, node->est_entries > 0 ? node->est_entries :
+					 1024);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the
+	 * main process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		   sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+					+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d8cf87e6d0..db0b75e252 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4937,6 +4964,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..27cc4c1864 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4076,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..d5931b1651 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e1967f12..b0ad218348 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4076,6 +4076,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 9a403a64d6..51979a58c8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -133,6 +134,7 @@ bool		enable_hashagg = true;
 bool		hashagg_avoid_disk_plan = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2297,6 +2299,148 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines and returns the estimated cost of using a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+					  ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor will
+	 * use.  If we leave this at zero the executor will just choose the size
+	 * itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4022,6 +4166,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..53f259fa55 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,162 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows,
+ *		including looking for volatile functions in the inner side of the
+ *		join.  Also, fetch outer side exprs and check for valid hashable
+ *		equality operator for each outer expr.  Returns true and sets the
+ *		'param_exprs' and 'operators' output parameters if the caching is
+ *		possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	List	   *clauses = param_info->ppi_clauses;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	Assert(list_length(clauses) > 0);
+
+	foreach(lc, clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		OpExpr	   *opexpr;
+		TypeCacheEntry *typentry;
+		Node	   *expr;
+
+		opexpr = (OpExpr *) rinfo->clause;
+
+		/* ppi_clauses should always meet this requirement */
+		if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+			!clause_sides_match_join(rinfo, outerrel, innerrel))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		if (rinfo->outer_is_left)
+			expr = (Node *) list_nth(opexpr->args, 0);
+		else
+			expr = (Node *) list_nth(opexpr->args, 1);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/* We can only have a result cache when there's some kind of cache key */
+	if (inner_path->param_info == NULL ||
+		inner_path->param_info->ppi_clauses == NIL)
+		return NULL;
+
+	/*
+	 * We can't use a result cache when a lateral join var is required from
+	 * somewhere else other than the inner side of the join.
+	 *
+	 * XXX maybe we can just include lateral_vars from above this in the
+	 * result cache's keys?  Not today though. It seems likely to reduce cache
+	 * hits which may make it very seldom worthwhile.
+	 */
+	if (!bms_is_subset(innerrel->lateral_relids, innerrel->relids))
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -376,6 +543,8 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	Path	   *inner_cache_path;
+	bool		added_path = false;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -458,12 +627,92 @@ try_nestloop_path(PlannerInfo *root,
 									  extra->restrictlist,
 									  pathkeys,
 									  required_outer));
+		added_path = true;
+	}
+
+	/*
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
+	 */
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+	{
+		if (!added_path)
+			bms_free(required_outer);
+		return;
+	}
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+
+				if (!added_path)
+					bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  extra,
+									  outer_path,
+									  inner_cache_path,
+									  extra->restrictlist,
+									  pathkeys,
+									  required_outer));
+		added_path = true;
 	}
 	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
+	}
+
+	if (!added_path)
 	{
 		/* Waste no memory when we reject a path here */
 		bms_free(required_outer);
 	}
+
 }
 
 /*
@@ -481,6 +730,9 @@ try_partial_nestloop_path(PlannerInfo *root,
 						  JoinPathExtraData *extra)
 {
 	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Path	   *inner_cache_path;
 
 	/*
 	 * If the inner path is parameterized, the parameterization must be fully
@@ -492,7 +744,6 @@ try_partial_nestloop_path(PlannerInfo *root,
 	if (inner_path->param_info != NULL)
 	{
 		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
-		RelOptInfo *outerrel = outer_path->parent;
 		Relids		outerrelids;
 
 		/*
@@ -511,41 +762,114 @@ try_partial_nestloop_path(PlannerInfo *root,
 
 	/*
 	 * Before creating a path, get a quick lower bound on what it is likely to
-	 * cost.  Bail out right away if it looks terrible.
+	 * cost.  Don't bother if it looks terrible.
 	 */
 	initial_cost_nestloop(root, &workspace, jointype,
 						  outer_path, inner_path, extra);
-	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
-		return;
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
+
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+				return;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
 
 	/*
-	 * If the inner path is parameterized, it is parameterized by the topmost
-	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
 	 */
-	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-	{
-		inner_path = reparameterize_path_by_child(root, inner_path,
-												  outer_path->parent);
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
 
+	if (inner_cache_path == NULL)
+		return;
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
 		/*
-		 * If we could not translate the path, we can't create nest loop path.
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
 		 */
-		if (!inner_path)
-			return;
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+				return;
+			}
+			else
+				inner_cache_path = reparameterize_path;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_cache_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
+	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
 	}
 
-	/* Might be good enough to be worth trying, so let's try it. */
-	add_partial_path(joinrel, (Path *)
-					 create_nestloop_path(root,
-										  joinrel,
-										  jointype,
-										  &workspace,
-										  extra,
-										  outer_path,
-										  inner_path,
-										  extra->restrictlist,
-										  pathkeys,
-										  NULL));
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eb9543f6ad..05223a835c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1529,55 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6359,6 +6421,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6947,6 +7031,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6992,6 +7077,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..13d1af1df1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -677,6 +677,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index b02fcb9bfe..16f45f38b3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -135,6 +136,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -232,6 +301,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2684,6 +2787,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 37d6d293c3..4f29b5b4e2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1519,6 +1519,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3816,6 +3865,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4053,6 +4113,15 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
 		case T_UniquePath:
 			{
 				UniquePath *upath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3a802d8627..e1ec4c46df 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1021,6 +1021,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0d98e546a6..fc522bfee1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -356,6 +356,7 @@
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
+#enable_resultcache = on
 #enable_mergejoin = on
 #enable_nestloop = on
 #enable_parallel_append = on
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7deeac662..3a3a24941d 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f5dfa32d55..90a114142e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1982,6 +1983,69 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of times we've skipped the subnode
+								 * scan due to tuples already being cached */
+	uint64		cache_misses;	/* number of times we've had to scan the
+								 * subnode to fetch tuples */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* limit the size of the cache to this (bytes) */
+	uint64		mem_lowerlimit; /* reduce memory usage below this when we free
+								 * up space */
+	MemoryContext tableContext; /* memory context for actual cache */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..671fbe81e8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold, or 0 if unknown
+								 */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..0512f1ae1c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,26 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 613db8eab6..7ce6a1bb5f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -58,6 +58,7 @@ extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool hashagg_avoid_disk_plan;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3bd184ae29..bdc8f3c742 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -950,12 +950,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2523,6 +2525,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2538,6 +2541,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 03ada654bb..d78be811d9 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -742,19 +742,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..d5a8eba085 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4890,14 +4898,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 4315e8e0a3..acee21c08e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2145,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2182,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                          explain_parallel_append                                           
+------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2220,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..3a920c083a
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,100 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.twenty
+           Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 5339  Cache Misses: 4661 Cache Evictions: 4056  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=4661)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=4661)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on tenk1 t1 (actual rows=3333 loops=3)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(12 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=10000 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=10000 loops=1)
+         ->  Result Cache (actual rows=1 loops=10000)
+               Cache Key: t2.twenty
+               Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(9 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+ count |        avg         
+-------+--------------------
+ 10000 | 9.5000000000000000
+(1 row)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..0d2b3c5c10 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1167,9 +1171,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f146..9993bca2fd 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -844,19 +844,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..1bd175d992 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -87,10 +87,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 044d515507..2eac836e76 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1076,9 +1076,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6658455a74..bc923ae873 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..ecf857c7f6
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,32 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
-- 
2.25.1

#14

andres@anarazel.de

over 5 years ago

In reply to: David Rowley (#1)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-05-20 23:44:27 +1200, David Rowley wrote:

I've attached a patch which implements this. The new node type is
called "Result Cache". I'm not particularly wedded to keeping that
name, but if I change it, I only want to do it once. I've got a few
other names I mind, but I don't feel strongly or confident enough in
them to go and do the renaming.

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

Greetings,

Andres Freund

#15

[1]: /messages/by-id/CAApHDvr-yx9DEJ1Lc9aAy8QZkgEZkTP=3hBRBe83Vwo=kAndcA@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#14)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

On 2020-05-20 23:44:27 +1200, David Rowley wrote:

I've attached a patch which implements this. The new node type is
called "Result Cache". I'm not particularly wedded to keeping that
name, but if I change it, I only want to do it once. I've got a few
other names I mind, but I don't feel strongly or confident enough in
them to go and do the renaming.

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

Here are a couple of examples, one with a Limit below the Result Cache
and one with a GroupAggregate.

postgres=# explain (costs off) select * from pg_Class c1 where relname
= (select relname from pg_Class c2 where c1.relname = c2.relname
offset 1 limit 1);
QUERY PLAN
-------------------------------------------------------------------------------------
Seq Scan on pg_class c1
Filter: (relname = (SubPlan 1))
SubPlan 1
-> Result Cache
Cache Key: c1.relname
-> Limit
-> Index Only Scan using pg_class_relname_nsp_index
on pg_class c2
Index Cond: (relname = c1.relname)
(8 rows)

postgres=# explain (costs off) select * from pg_Class c1 where relname
= (select relname from pg_Class c2 where c1.relname = c2.relname group
by 1 having count(*) > 1);
QUERY PLAN
-------------------------------------------------------------------------------------
Seq Scan on pg_class c1
Filter: (relname = (SubPlan 1))
SubPlan 1
-> Result Cache
Cache Key: c1.relname
-> GroupAggregate
Group Key: c2.relname
Filter: (count(*) > 1)
-> Index Only Scan using pg_class_relname_nsp_index
on pg_class c2
Index Cond: (relname = c1.relname)
(10 rows)

As for putting the logic somewhere like ExecReScan() then the first
paragraph in [1]/messages/by-id/CAApHDvr-yx9DEJ1Lc9aAy8QZkgEZkTP=3hBRBe83Vwo=kAndcA@mail.gmail.com are my thoughts on that.

David

#16

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#13)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 8 Jul 2020 at 15:37, David Rowley <dgrowleyml@gmail.com> wrote:

The attached v5 patch set fixes that.

I've attached an updated set of patches for this per recent conflict.

I'd like to push the 0002 patch quite soon as I think it's an
improvement to simplehash.h regardless of if we get Result Cache. It
reuses the SH_LOOKUP function for deletes. Also, if we ever get around
to giving up performing a lookup if we get too far away from the
optimal bucket, then that would only need to appear in one location
rather than in two.

Andres, or anyone, any objections to me pushing 0002?

David

Attachments:

v6-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v6-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From 977d6f8f42f1d33a698417593c62d94938abc9c9 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v6 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 116 +++++++++++++++++++----------------
 1 file changed, 63 insertions(+), 53 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..e7df323de5 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,75 +833,80 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
- * present.
+ * Delete 'entry' from hash table.
  */
-SH_SCOPE bool
-SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 {
-	uint32		hash = SH_HASH_KEY(tb, key);
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
 	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
-	uint32		curelem = startelem;
+	uint32		curelem;
 
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
 	while (true)
 	{
-		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
 
-		if (entry->status == SH_STATUS_EMPTY)
-			return false;
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
 
-		if (entry->status == SH_STATUS_IN_USE &&
-			SH_COMPARE_KEYS(tb, hash, key, entry))
+		if (curentry->status != SH_STATUS_IN_USE)
 		{
-			SH_ELEMENT_TYPE *lastentry = entry;
-
-			tb->members--;
-
-			/*
-			 * Backward shift following elements till either an empty element
-			 * or an element at its optimal position is encountered.
-			 *
-			 * While that sounds expensive, the average chain length is short,
-			 * and deletions would otherwise require tombstones.
-			 */
-			while (true)
-			{
-				SH_ELEMENT_TYPE *curentry;
-				uint32		curhash;
-				uint32		curoptimal;
-
-				curelem = SH_NEXT(tb, curelem, startelem);
-				curentry = &tb->data[curelem];
-
-				if (curentry->status != SH_STATUS_IN_USE)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
-
-				curhash = SH_ENTRY_HASH(tb, curentry);
-				curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				/* current is at optimal position, done */
-				if (curoptimal == curelem)
-				{
-					lastentry->status = SH_STATUS_EMPTY;
-					break;
-				}
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
 
-				/* shift */
-				memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
 
-				lastentry = curentry;
-			}
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
 
-			return true;
-		}
+		lastentry = curentry;
+	}
+}
 
-		/* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+	SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);
 
-		curelem = SH_NEXT(tb, curelem, startelem);
+	if (likely(entry != NULL))
+	{
+		/*
+		 * Perform deletion and also the relocation of subsequent items which
+		 * are not in their optimal position but can now be moved up.
+		 */
+		SH_DELETE_ITEM(tb, entry);
+		return true;
 	}
+
+	return false;		/* Can't find 'key' */
 }
 
 /*
@@ -1102,6 +1111,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.21.0.windows.1

v6-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v6-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From 8fe645a94a839586ff4e8e93876b1f44d1fe3c25 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v6 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..70f6fa2493 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2960,7 +2960,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index fda4b2c6e8..5a7f5afb94 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1864,7 +1864,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b40a112c25..64d8cfb89f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4778,7 +4780,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2ebd4ea332..20b2025272 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c1fc866cbf..e528e05459 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53d974125f..0aca990537 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3365,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3373,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3580,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.21.0.windows.1

v6-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v6-0003-Add-Result-Cache-executor-node.patchDownload

From 70c7f1d9208c7d7281b79605d6ff5bc821937d03 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v6 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   28 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  112 ++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1111 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  149 +++
 src/backend/optimizer/path/joinpath.c         |  374 +++++-
 src/backend/optimizer/plan/createplan.c       |   86 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   70 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   64 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   20 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |   51 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  242 ++--
 src/test/regress/expected/resultcache.out     |  100 ++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   24 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   32 +
 47 files changed, 2879 insertions(+), 229 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..9090657b19 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2914,10 +2916,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2933,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2944,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..00b3567e0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a7177c550..9d909d3c07 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4737,6 +4737,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e565fd337..cf91d3701f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1279,6 +1281,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1970,6 +1975,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3043,6 +3052,109 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		int			n;
+
+		for (n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Cache Hits: " UINT64_FORMAT "  Cache Misses: " UINT64_FORMAT " Cache Evictions: " UINT64_FORMAT "  Cache Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..f32876f412 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collactions: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..d4c50c261d 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *)planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 01b7b926bf..f37cc48cd5 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -309,6 +310,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Sort:
 			result = (PlanState *) ExecInitSort((Sort *) node,
 												estate, eflags);
@@ -695,6 +701,10 @@ ExecEndNode(PlanState *node)
 			ExecEndMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecEndSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..82d33e1b78
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1111 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *-------------------------------------------------------------------------
+ */
+ /*
+  * INTERFACE ROUTINES
+  *		ExecResultCache			- materialize the result of a subplan
+  *		ExecInitResultCache		- initialize node and subnodes
+  *		ExecEndResultCache		- shutdown node and subnodes
+  *		ExecReScanResultCache	- rescan the result cache
+  */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+ * ResultCacheTuple
+ * Stores an individually cached tuple
+ */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;			/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node		lru_node;	/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;			/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32			hash;			/* Hash value (cached) */
+	char			status;			/* Hash status */
+	bool			complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int ResultCacheHash_equal(struct resultcache_hash *tb,
+								 const ResultCacheKey *params1,
+								 const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot	 *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid			*collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])			/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+								  collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int				numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple   *tuple = entry->tuplehead;
+	uint64				freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey	   *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int i, count;
+		uint64 mem = 0;
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+
+				ResultCacheTuple   *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool					specialkey_intact = true;		/* for now */
+	dlist_mutable_iter		iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey   *key = dlist_container(ResultCacheKey, lru_node,
+												iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this
+		 * LRU entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set spaceOK to
+		 * false to inform the caller the specialkey entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey		   *key;
+	ResultCacheEntry	   *entry;
+	MemoryContext			oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/* Move existing entry to the tail of the LRU list */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple	   *tuple;
+	ResultCacheEntry	   *entry = rcstate->entry;
+	MemoryContext			oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+		rcstate->last_tuple = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+		rcstate->last_tuple = tuple;
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused
+		 * the code in simplehash.h to shuffle elements to earlier buckets in
+		 * the hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1; /* stats update */
+
+					/* Fetch the first cached tuple, if there is one */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* No tuples in this cache entry. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1; /* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to
+						 * failure to free enough cache space, so ensure we
+						 * don't do anything here that assumes it worked.
+						 * There's no need to go into bypass mode here as
+						 * we're setting rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+						!cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1; /* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output. */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.
+					 * XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;			/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	} /* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												   &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations;	/* Just point directly to the plan data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the planner failed to come up with a better value.
+	 */
+	build_hash_table(rcstate, node->est_entries > 0 ? node->est_entries :
+					 1024);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the
+	 * main process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		   sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+					+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409de66..2c3426d7cc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4937,6 +4964,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..27cc4c1864 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4076,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..d5931b1651 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6da0dcd61c..404f337bc9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4090,6 +4090,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a7f5afb94..76c21d6011 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -132,6 +133,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2296,6 +2298,148 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines and returns the estimated cost of using a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+					  ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor will
+	 * use.  If we leave this at zero the executor will just choose the size
+	 * itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4023,6 +4167,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..53f259fa55 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,162 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows,
+ *		including looking for volatile functions in the inner side of the
+ *		join.  Also, fetch outer side exprs and check for valid hashable
+ *		equality operator for each outer expr.  Returns true and sets the
+ *		'param_exprs' and 'operators' output parameters if the caching is
+ *		possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	List	   *clauses = param_info->ppi_clauses;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	Assert(list_length(clauses) > 0);
+
+	foreach(lc, clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		OpExpr	   *opexpr;
+		TypeCacheEntry *typentry;
+		Node	   *expr;
+
+		opexpr = (OpExpr *) rinfo->clause;
+
+		/* ppi_clauses should always meet this requirement */
+		if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+			!clause_sides_match_join(rinfo, outerrel, innerrel))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		if (rinfo->outer_is_left)
+			expr = (Node *) list_nth(opexpr->args, 0);
+		else
+			expr = (Node *) list_nth(opexpr->args, 1);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/* We can only have a result cache when there's some kind of cache key */
+	if (inner_path->param_info == NULL ||
+		inner_path->param_info->ppi_clauses == NIL)
+		return NULL;
+
+	/*
+	 * We can't use a result cache when a lateral join var is required from
+	 * somewhere else other than the inner side of the join.
+	 *
+	 * XXX maybe we can just include lateral_vars from above this in the
+	 * result cache's keys?  Not today though. It seems likely to reduce cache
+	 * hits which may make it very seldom worthwhile.
+	 */
+	if (!bms_is_subset(innerrel->lateral_relids, innerrel->relids))
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -376,6 +543,8 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	Path	   *inner_cache_path;
+	bool		added_path = false;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -458,12 +627,92 @@ try_nestloop_path(PlannerInfo *root,
 									  extra->restrictlist,
 									  pathkeys,
 									  required_outer));
+		added_path = true;
+	}
+
+	/*
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
+	 */
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+	{
+		if (!added_path)
+			bms_free(required_outer);
+		return;
+	}
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+
+				if (!added_path)
+					bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  extra,
+									  outer_path,
+									  inner_cache_path,
+									  extra->restrictlist,
+									  pathkeys,
+									  required_outer));
+		added_path = true;
 	}
 	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
+	}
+
+	if (!added_path)
 	{
 		/* Waste no memory when we reject a path here */
 		bms_free(required_outer);
 	}
+
 }
 
 /*
@@ -481,6 +730,9 @@ try_partial_nestloop_path(PlannerInfo *root,
 						  JoinPathExtraData *extra)
 {
 	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Path	   *inner_cache_path;
 
 	/*
 	 * If the inner path is parameterized, the parameterization must be fully
@@ -492,7 +744,6 @@ try_partial_nestloop_path(PlannerInfo *root,
 	if (inner_path->param_info != NULL)
 	{
 		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
-		RelOptInfo *outerrel = outer_path->parent;
 		Relids		outerrelids;
 
 		/*
@@ -511,41 +762,114 @@ try_partial_nestloop_path(PlannerInfo *root,
 
 	/*
 	 * Before creating a path, get a quick lower bound on what it is likely to
-	 * cost.  Bail out right away if it looks terrible.
+	 * cost.  Don't bother if it looks terrible.
 	 */
 	initial_cost_nestloop(root, &workspace, jointype,
 						  outer_path, inner_path, extra);
-	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
-		return;
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
+
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+				return;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
 
 	/*
-	 * If the inner path is parameterized, it is parameterized by the topmost
-	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
 	 */
-	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-	{
-		inner_path = reparameterize_path_by_child(root, inner_path,
-												  outer_path->parent);
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
 
+	if (inner_cache_path == NULL)
+		return;
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
 		/*
-		 * If we could not translate the path, we can't create nest loop path.
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
 		 */
-		if (!inner_path)
-			return;
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+				return;
+			}
+			else
+				inner_cache_path = reparameterize_path;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_cache_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
+	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
 	}
 
-	/* Might be good enough to be worth trying, so let's try it. */
-	add_partial_path(joinrel, (Path *)
-					 create_nestloop_path(root,
-										  joinrel,
-										  jointype,
-										  &workspace,
-										  extra,
-										  outer_path,
-										  inner_path,
-										  extra->restrictlist,
-										  pathkeys,
-										  NULL));
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..a184ba5bd0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1529,55 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6341,6 +6403,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6929,6 +7013,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6974,6 +7059,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..13d1af1df1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -677,6 +677,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 9a8f738c9d..8e5703aeef 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -135,6 +136,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -232,6 +301,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2685,6 +2788,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e528e05459..6cf18a6803 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3849,6 +3898,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4067,6 +4127,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index de87ad6ef7..19838de16d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..f202e15101 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -360,6 +360,7 @@
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
+#enable_resultcache = on
 #enable_mergejoin = on
 #enable_nestloop = on
 #enable_parallel_append = on
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..48dd235bfd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cf832d7f90..f1d93dac08 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1982,6 +1983,69 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of times we've skipped the subnode
+								 * scan due to tuples already being cached */
+	uint64		cache_misses;	/* number of times we've had to scan the
+								 * subnode to fetch tuples */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* limit the size of the cache to this (bytes) */
+	uint64		mem_lowerlimit; /* reduce memory usage below this when we free
+								 * up space */
+	MemoryContext tableContext; /* memory context for actual cache */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..671fbe81e8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold, or 0 if unknown
+								 */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..0512f1ae1c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,26 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects us to hold */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..21d3dbdad4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2577,6 +2579,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2595,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 03ada654bb..d78be811d9 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -742,19 +742,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..d5a8eba085 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4890,14 +4898,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 687cf8c5f4..3fa4bf4525 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2145,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2182,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                          explain_parallel_append                                           
+------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2220,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           Worker 0:  Cache Hits: N  Cache Misses: N Cache Evictions: 0  Cache Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..3a920c083a
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,100 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.twenty
+           Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 5339  Cache Misses: 4661 Cache Evictions: 4056  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=4661)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=4661)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on tenk1 t1 (actual rows=3333 loops=3)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Cache Hits: 9000  Cache Misses: 1000 Cache Evictions: 0  Cache Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(12 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=10000 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=10000 loops=1)
+         ->  Result Cache (actual rows=1 loops=10000)
+               Cache Key: t2.twenty
+               Cache Hits: 9980  Cache Misses: 20 Cache Evictions: 0  Cache Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(9 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+ count |        avg         
+-------+--------------------
+ 10000 | 9.5000000000000000
+(1 row)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..0d2b3c5c10 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1167,9 +1171,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 1c5d80da32..edb775dcf8 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -844,19 +844,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..1bd175d992 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -87,10 +87,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 93ef9dc1f3..d99e762295 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Cache Hits: \d+', 'Cache Hits: N');
+        ln := regexp_replace(ln, 'Cache Misses: \d+', 'Cache Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..ecf857c7f6
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,32 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty) FROM tenk1 t1;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand) FROM tenk1 t1;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty;
-- 
2.21.0.windows.1

#17

andres@anarazel.de

over 5 years ago

In reply to: David Rowley (#16)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-08-04 10:05:25 +1200, David Rowley wrote:

I'd like to push the 0002 patch quite soon as I think it's an
improvement to simplehash.h regardless of if we get Result Cache. It
reuses the SH_LOOKUP function for deletes. Also, if we ever get around
to giving up performing a lookup if we get too far away from the
optimal bucket, then that would only need to appear in one location
rather than in two.

Andres, or anyone, any objections to me pushing 0002?

I think it'd be good to add a warning that, unless one is very careful,
no other hashtable modifications are allowed between lookup and
modification. E.g. something like
a = foobar_lookup();foobar_insert();foobar_delete();
will occasionally go wrong...

-		/* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+	SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);

-		curelem = SH_NEXT(tb, curelem, startelem);
+	if (likely(entry != NULL))
+	{
+		/*
+		 * Perform deletion and also the relocation of subsequent items which
+		 * are not in their optimal position but can now be moved up.
+		 */
+		SH_DELETE_ITEM(tb, entry);
+		return true;
}
+
+	return false;		/* Can't find 'key' */
}

You meantioned on IM that there's a slowdowns with gcc. I wonder if this
could partially be responsible. Does SH_DELETE inline LOOKUP and
DELETE_ITEM? And does the generated code end up reloading entry-> or
tb-> members?

When the SH_SCOPE isn't static *, then IIRC gcc on unixes can't rely on
the called function actually being the function defined in the same
translation unit (unless -fno-semantic-interposition is specified).

Hm, but you said that this happens in tidbitmap.c, and there all
referenced functions are local statics. So that's not quite the
explanation I was thinking it was...

Hm. Also wonder whether we currently (i.e. the existing code) we
unnecessarily end up reloading tb->data a bunch of times, because we do
the access to ->data as
SH_ELEMENT_TYPE *entry = &tb->data[curelem];

Think we should instead store tb->data in a local variable.

Greetings,

Andres Freund

#18

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: David Rowley (#1)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, May 20, 2020 at 7:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

I've attached a patch which implements this. The new node type is
called "Result Cache". I'm not particularly wedded to keeping that
name, but if I change it, I only want to do it once. I've got a few
other names I mind, but I don't feel strongly or confident enough in
them to go and do the renaming.

This is cool work; I am going to bikeshed on the name for a minute. I
don't think Result Cache is terrible, but I have two observations:

1. It might invite confusion with a feature of some other database
systems where they cache the results of entire queries and try to
reuse the entire result set.

2. The functionality reminds me a bit of a Materialize node, except
that instead of overflowing to disk, we throw away cache entries, and
instead of caching just one result, we potentially cache many.

I can't really think of a way to work Materialize into the node name
and I'm not sure it would be the right idea anyway. But I wonder if
maybe a name like "Parameterized Cache" would be better? That would
avoid confusion with any other use of the phrase "result cache"; also,
an experienced PostgreSQL user might be more likely to guess how a
"Parameterized Cache" is different from a "Materialize" node than they
would be if it were called a "Result Cache".

Just my $0.02,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: David Rowley (#1)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, May 20, 2020 at 4:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

Does it seem like something we might want for PG14?

Minor terminology issue: "Hybrid Hash Join" is a specific hash join
algorithm which is unrelated to what you propose to do here. I hope
that confusion can be avoided, possibly by not using the word hybrid
in the name.

--
Peter Geoghegan

#20

dgrowleyml@gmail.com

over 5 years ago

In reply to: Robert Haas (#18)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 6 Aug 2020 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:

This is cool work; I am going to bikeshed on the name for a minute. I
don't think Result Cache is terrible, but I have two observations:

Thanks

1. It might invite confusion with a feature of some other database
systems where they cache the results of entire queries and try to
reuse the entire result set.

Yeah. I think "Cache" is good to keep, but I'm pretty much in favour
of swapping "Result" for something else. It's a bit too close to the
"Result" node in name, but too distant for everything else.

2. The functionality reminds me a bit of a Materialize node, except
that instead of overflowing to disk, we throw away cache entries, and
instead of caching just one result, we potentially cache many.

I can't really think of a way to work Materialize into the node name
and I'm not sure it would be the right idea anyway. But I wonder if
maybe a name like "Parameterized Cache" would be better?

Yeah, I think that name is better. The only downside as far as I can
see is the length of it.

I'll hold off a bit before doing any renaming though to see what other
people think. I just feel bikeshedding on the name is something that's
going to take up quite a bit of time and effort with this. I plan to
rename it at most once.

Thanks for the comments

David

#21

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#17)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 6 Aug 2020 at 05:44, Andres Freund <andres@anarazel.de> wrote:

Andres, or anyone, any objections to me pushing 0002?

I think it'd be good to add a warning that, unless one is very careful,
no other hashtable modifications are allowed between lookup and
modification. E.g. something like
a = foobar_lookup();foobar_insert();foobar_delete();
will occasionally go wrong...

Good point. I agree. An insert could grow the table. Additionally,
another delete could shuffle elements back to a more optimal position
so we couldn't do any inserts or deletes between the lookup of the
item to delete and the actual delete.

-             /* TODO: return false; if distance too big */
+/*
+ * Perform hash table lookup on 'key', delete the entry belonging to it and
+ * return true.  Returns false if no item could be found relating to 'key'.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
+{
+     SH_ELEMENT_TYPE *entry = SH_LOOKUP(tb, key);

-             curelem = SH_NEXT(tb, curelem, startelem);
+     if (likely(entry != NULL))
+     {
+             /*
+              * Perform deletion and also the relocation of subsequent items which
+              * are not in their optimal position but can now be moved up.
+              */
+             SH_DELETE_ITEM(tb, entry);
+             return true;
}
+
+     return false;           /* Can't find 'key' */
}

Yeah both the SH_LOOKUP and SH_DELETE_ITEM are inlined.

I think the difference might be coming from the fact that I have to
calculate the bucket index from the bucket pointer using:

/* Calculate the index of 'entry' */
curelem = entry - &tb->data[0];

There is some slight change of instructions due to the change in the
hash lookup part of SH_DELETE, but for the guts of the code that's
generated for SH_DELETE_ITEM, there's a set of instructions that are
just additional:

subq %r10, %rax
sarq $4, %rax
imull $-1431655765, %eax, %eax
leal 1(%rax), %r8d

For testing sake, I changed the curelem = entry - &tb->data[0]; to
just be curelem = 10; and those 4 instructions disappear.

I can't really work out what the imull constant means. In binary, that
number is 10101010101010101010101010101011

I wonder if it might be easier if I just leave SH_DELETE alone and
just add a new function to delete with the known element.

When the SH_SCOPE isn't static *, then IIRC gcc on unixes can't rely on
the called function actually being the function defined in the same
translation unit (unless -fno-semantic-interposition is specified).

Hm, but you said that this happens in tidbitmap.c, and there all
referenced functions are local statics. So that's not quite the
explanation I was thinking it was...

Hm. Also wonder whether we currently (i.e. the existing code) we
unnecessarily end up reloading tb->data a bunch of times, because we do
the access to ->data as
SH_ELEMENT_TYPE *entry = &tb->data[curelem];

Think we should instead store tb->data in a local variable.

At the start of SH_DELETE_ITEM I tried doing:

SH_ELEMENT_TYPE *buckets = tb->data;

then referencing that local var instead of tb->data in the body of the
loop. No meaningful improvements to the assembly. It just seems to
adjust which registers are used.

WIth the local var I see:

addq %r9, %rdx

but in the version without the local variable I see:

addq 24(%rdi), %rdx

the data array is 24 bytes into the SH_TYPE struct. So it appears like
we just calculate the offset to that field by adding 24 to the tb
field without the local var and just load the value from the register
that's storing the local var otherwise.

David

#22

andres@anarazel.de

over 5 years ago

In reply to: David Rowley (#15)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

On 2020-05-20 23:44:27 +1200, David Rowley wrote:

I've attached a patch which implements this. The new node type is
called "Result Cache". I'm not particularly wedded to keeping that
name, but if I change it, I only want to do it once. I've got a few
other names I mind, but I don't feel strongly or confident enough in
them to go and do the renaming.

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to every
single executor node. That doesn't determine how many nodes would need
to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

Greetings,

Andres Freund

#23

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#22)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de> wrote:

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to every
single executor node. That doesn't determine how many nodes would need
to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

hmm, I think you're right there about those two node types. I'm just
not sure you're right about overloading these node types to act as a
cache. How would you inform users via EXPLAIN ANALYZE of how many
cache hits/misses occurred? What would you use to disable it for an
escape hatch for when the planner makes a bad choice about caching?

David

#24

andres@anarazel.de

over 5 years ago

In reply to: David Rowley (#23)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-08-11 17:23:42 +1200, David Rowley wrote:

On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de> wrote:

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to every
single executor node. That doesn't determine how many nodes would need
to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

hmm, I think you're right there about those two node types. I'm just
not sure you're right about overloading these node types to act as a
cache.

I'm not 100% either, to be clear. I am just acutely aware that adding
entire nodes is pretty expensive, and that there's, afaict, no need to
have arbitrary (i.e. pointer to function) type callbacks to point to the
cache.

How would you inform users via EXPLAIN ANALYZE of how many
cache hits/misses occurred?

Similar to how we display memory for sorting etc.

What would you use to disable it for an
escape hatch for when the planner makes a bad choice about caching?

Isn't that *easier* when embedding it into the node? There's no nice way
to remove an intermediary executor node entirely, but it's trivial to
have an if statement like
if (node->cache && upsert_cache(node->cache, param))

Greetings,

Andres Freund

#25

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#24)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 11 Aug 2020 at 17:44, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-08-11 17:23:42 +1200, David Rowley wrote:

On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de> wrote:

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to every
single executor node. That doesn't determine how many nodes would need
to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

hmm, I think you're right there about those two node types. I'm just
not sure you're right about overloading these node types to act as a
cache.

I'm not 100% either, to be clear. I am just acutely aware that adding
entire nodes is pretty expensive, and that there's, afaict, no need to
have arbitrary (i.e. pointer to function) type callbacks to point to the
cache.

Perhaps you're right, but I'm just not convinced of it. I feel
there's a certain air of magic involved in any node that has a good
name and reputation for doing one thing that we suddenly add new
functionality to which causes it to perform massively differently.

A counterexample to your argument is that Materialize is a node type.
There's only a limits number of places where that node is used. One of
those places can be on the inside of a non-parameterized nested loop.
Your argument of having Nested Loop do caching would also indicate
that Materialize should be part of Nested Loop instead of a node
itself. There's a few other places Materialize is used, e.g scrollable
cursors, but in that regard, you could say that the caching should be
handled in ExecutePlan(). I just don't think it should be, as I don't
think Result Cache should be part of any other node or code.

Another problem I see with overloading nodeSubplan and nodeNestloop
is, we don't really document our executor nodes today, so unless this
patch starts a new standard for that, then there's not exactly a good
place to mention that parameterized nested loops may now cache results
from the inner scan.

I do understand what you mean with the additional node overhead. I saw
that in my adventures of INNER JOIN removals a few years ago. I hope
the fact that I've tried to code the planner so that for nested loops,
it only uses a Result Cache node when it thinks it'll speed things up.
That decision is of course based on having good statistics, which
might not be the case. I don't quite have that luxury with subplans
due to lack of knowledge of the outer plan when planning the subquery.

How would you inform users via EXPLAIN ANALYZE of how many
cache hits/misses occurred?

Similar to how we display memory for sorting etc.

I was more thinking of how bizarre it would be to see Nested Loop and
SubPlan report cache statistics. It may appear quite magical to users
to see EXPLAIN ANALYZE mentioning that their Nested Loop is now
reporting something about cache hits.

What would you use to disable it for an
escape hatch for when the planner makes a bad choice about caching?

Isn't that *easier* when embedding it into the node? There's no nice way
to remove an intermediary executor node entirely, but it's trivial to
have an if statement like
if (node->cache && upsert_cache(node->cache, param))

I was more meaning that it might not make sense to keep the
enable_resultcache GUC if the caching were part of the existing nodes.
I think people are pretty used to the enable_* GUCs corresponding to
an executor whose name roughly matches the name of the GUC. In this
case, without a Result Cache node, enable_resultcache would not assist
in self-documenting. However, perhaps 2 new GUCs instead,
enable_nestloop_caching and enable_subplan_caching. We're currently
short of any other enable_* GUCs that are node modifiers. We did have
enable_hashagg_disk until a few weeks ago. Nobody seemed to like that,
but perhaps there were other reasons for people not to like it other
than it was a node modifier GUC.

I'm wondering if anyone else has any thoughts on this?

David

#26

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#5)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 25 May 2020 at 19:53, David Rowley <dgrowleyml@gmail.com> wrote:

I didn't quite get the LATERAL support quite done in the version I
sent. For now, I'm not considering adding a Result Cache node if there
are lateral vars in any location other than the inner side of the
nested loop join. I think it'll just be a few lines to make it work
though. I wanted to get some feedback before going to too much more
trouble to make all cases work.

I've now changed the patch so that it supports adding a Result Cache
node to LATERAL joins.

e.g.

regression=# explain analyze select count(*) from tenk1 t1, lateral
(select x from generate_Series(1,t1.twenty) x) gs;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=150777.53..150777.54 rows=1 width=8) (actual
time=22.191..22.191 rows=1 loops=1)
-> Nested Loop (cost=0.01..125777.53 rows=10000000 width=0)
(actual time=0.010..16.980 rows=95000 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000
width=4) (actual time=0.003..0.866 rows=10000 loops=1)
-> Result Cache (cost=0.01..10.01 rows=1000 width=0)
(actual time=0.000..0.001 rows=10 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Function Scan on generate_series x
(cost=0.00..10.00 rows=1000 width=0) (actual time=0.001..0.002 rows=10
loops=20)
Planning Time: 0.046 ms
Execution Time: 22.208 ms
(9 rows)

Time: 22.704 ms
regression=# set enable_resultcache=0;
SET
Time: 0.367 ms
regression=# explain analyze select count(*) from tenk1 t1, lateral
(select x from generate_Series(1,t1.twenty) x) gs;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=225445.00..225445.01 rows=1 width=8) (actual
time=35.578..35.579 rows=1 loops=1)
-> Nested Loop (cost=0.00..200445.00 rows=10000000 width=0)
(actual time=0.008..30.196 rows=95000 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000
width=4) (actual time=0.002..0.905 rows=10000 loops=1)
-> Function Scan on generate_series x (cost=0.00..10.00
rows=1000 width=0) (actual time=0.001..0.002 rows=10 loops=10000)
Planning Time: 0.031 ms
Execution Time: 35.590 ms
(6 rows)

Time: 36.027 ms

v7 patch series attached.

I also modified the 0002 patch so instead of modifying simplehash.h's
SH_DELETE function to have it call SH_LOOKUP and the newly added
SH_DELETE_ITEM function, I've just added an entirely new
SH_DELETE_ITEM and left SH_DELETE untouched. Trying to remove the
code duplication without having a negative effect on performance was
tricky and it didn't save enough code to seem worthwhile enough.

I also did a round of polishing work, fixed a spelling mistake in a
comment and reworded a few other comments to make some meaning more
clear.

David

Attachments:

v7-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v7-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From 5b7f411e829e7b902d807f09c7b6be4dcddc0fc5 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v7 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..70f6fa2493 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2960,7 +2960,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index fda4b2c6e8..5a7f5afb94 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1864,7 +1864,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b40a112c25..64d8cfb89f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4778,7 +4780,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2ebd4ea332..20b2025272 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c1fc866cbf..e528e05459 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 00c7afc66f..2f1c1b8ec4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3365,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3373,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3580,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.25.1

v7-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v7-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From e8b3189ec9b281fdeddcf13eb50216642f9ecb16 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v7 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..dc1f1df07e 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.25.1

v7-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v7-0003-Add-Result-Cache-executor-node.patchDownload

From 148e6971499d444b416d713d8c415b43966a596e Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v7 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   51 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  119 +-
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1122 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  149 +++
 src/backend/optimizer/path/joinpath.c         |  407 +++++-
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   70 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/lib/simplehash.h                  |    8 +-
 src/include/nodes/execnodes.h                 |   67 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/include/utils/selfuncs.h                  |    6 +-
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |  129 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  237 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   44 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   54 +
 49 files changed, 3083 insertions(+), 286 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..b8eff40d92 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2118,22 +2120,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2914,10 +2919,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2936,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2947,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..00b3567e0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a7177c550..9d909d3c07 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4737,6 +4737,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 30e0a7ee7f..4cb3215728 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1279,6 +1281,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1970,6 +1975,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3043,6 +3052,114 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
@@ -3109,7 +3226,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 			if (aggstate->hash_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-					aggstate->hash_disk_used);
+								 aggstate->hash_disk_used);
 			}
 		}
 
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..5e30623ad1 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..459e9dd3e9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 01b7b926bf..fbbe667cc1 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..09b25ea184
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1122 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- materialize the result of a subplan
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+  * ResultCacheTuple Stores an individually cached tuple
+  */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* The cache entry is void of any tuples. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.  XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the planner failed to come up with a better value.
+	 */
+	build_hash_table(rcstate, node->est_entries > 0 ? node->est_entries :
+					 1024);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409de66..2c3426d7cc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4937,6 +4964,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..27cc4c1864 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4076,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..d5931b1651 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6da0dcd61c..404f337bc9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4090,6 +4090,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a7f5afb94..e50844df9b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -132,6 +133,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2296,6 +2298,148 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4023,6 +4167,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..f4c76577ad 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,195 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos((Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -376,6 +576,8 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	Path	   *inner_cache_path;
+	bool		added_path = false;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -458,12 +660,92 @@ try_nestloop_path(PlannerInfo *root,
 									  extra->restrictlist,
 									  pathkeys,
 									  required_outer));
+		added_path = true;
+	}
+
+	/*
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
+	 */
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+	{
+		if (!added_path)
+			bms_free(required_outer);
+		return;
+	}
+
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+
+				if (!added_path)
+					bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  extra,
+									  outer_path,
+									  inner_cache_path,
+									  extra->restrictlist,
+									  pathkeys,
+									  required_outer));
+		added_path = true;
 	}
 	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
+	}
+
+	if (!added_path)
 	{
 		/* Waste no memory when we reject a path here */
 		bms_free(required_outer);
 	}
+
 }
 
 /*
@@ -481,6 +763,9 @@ try_partial_nestloop_path(PlannerInfo *root,
 						  JoinPathExtraData *extra)
 {
 	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Path	   *inner_cache_path;
 
 	/*
 	 * If the inner path is parameterized, the parameterization must be fully
@@ -492,7 +777,6 @@ try_partial_nestloop_path(PlannerInfo *root,
 	if (inner_path->param_info != NULL)
 	{
 		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
-		RelOptInfo *outerrel = outer_path->parent;
 		Relids		outerrelids;
 
 		/*
@@ -511,41 +795,114 @@ try_partial_nestloop_path(PlannerInfo *root,
 
 	/*
 	 * Before creating a path, get a quick lower bound on what it is likely to
-	 * cost.  Bail out right away if it looks terrible.
+	 * cost.  Don't bother if it looks terrible.
 	 */
 	initial_cost_nestloop(root, &workspace, jointype,
 						  outer_path, inner_path, extra);
-	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
-		return;
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
+
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+				return;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
 
 	/*
-	 * If the inner path is parameterized, it is parameterized by the topmost
-	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 * See if we can build a result cache path for this inner_path. That might
+	 * make the nested loop cheaper.
 	 */
-	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-	{
-		inner_path = reparameterize_path_by_child(root, inner_path,
-												  outer_path->parent);
+	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
+											inner_path, outer_path, jointype,
+											extra);
+
+	if (inner_cache_path == NULL)
+		return;
 
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_cache_path, extra);
+	if (add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+	{
 		/*
-		 * If we could not translate the path, we can't create nest loop path.
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
 		 */
-		if (!inner_path)
-			return;
+		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		{
+			Path	   *reparameterize_path;
+
+			reparameterize_path = reparameterize_path_by_child(root,
+															   inner_cache_path,
+															   outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!reparameterize_path)
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+				/* Waste no memory when we reject a path here */
+				list_free(rcpath->hash_operators);
+				list_free(rcpath->param_exprs);
+				pfree(rcpath);
+				return;
+			}
+			else
+				inner_cache_path = reparameterize_path;
+		}
+
+		/* Might be good enough to be worth trying, so let's try it. */
+		add_partial_path(joinrel, (Path *)
+						 create_nestloop_path(root,
+											  joinrel,
+											  jointype,
+											  &workspace,
+											  extra,
+											  outer_path,
+											  inner_cache_path,
+											  extra->restrictlist,
+											  pathkeys,
+											  NULL));
+	}
+	else
+	{
+		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
+
+		/* Waste no memory when we reject a path here */
+		list_free(rcpath->hash_operators);
+		list_free(rcpath->param_exprs);
+		pfree(rcpath);
 	}
 
-	/* Might be good enough to be worth trying, so let's try it. */
-	add_partial_path(joinrel, (Path *)
-					 create_nestloop_path(root,
-										  joinrel,
-										  jointype,
-										  &workspace,
-										  extra,
-										  outer_path,
-										  inner_path,
-										  extra->restrictlist,
-										  pathkeys,
-										  NULL));
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..45e211262a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1529,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6341,6 +6404,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6929,6 +7014,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6974,6 +7060,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..a7af7dbed2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -679,6 +679,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_ResultCache:
 		case T_Unique:
 		case T_SetOp:
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6eb794669f..3e2c61b0a0 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -136,6 +137,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -233,6 +302,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2718,6 +2821,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e528e05459..6cf18a6803 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3849,6 +3898,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4067,6 +4127,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index de87ad6ef7..19838de16d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..6bca3dfc9f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..48dd235bfd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index dc1f1df07e..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -921,11 +921,11 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 	tb->members--;
 
 	/*
-	 * Backward shift following elements till either an empty element
-	 * or an element at its optimal position is encountered.
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
 	 *
-	 * While that sounds expensive, the average chain length is short,
-	 * and deletions would otherwise require tombstones.
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
 	 */
 	while (true)
 	{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..30f66d5058 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1984,6 +1985,72 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* memory limit in bytes for the cache */
+	uint64		mem_lowerlimit; /* reduce memory usage to below this when we
+								 * free up space */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..79a4ad20dd 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..ac5685da64 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..21d3dbdad4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 455e1343ee..57ca9fda8d 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
  * callers to provide further details about some assumptions which were made
  * during the estimation.
  */
-#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
-											  * the DEFAULTs as defined above.
-											  */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2577,6 +2579,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2595,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 03ada654bb..d78be811d9 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -742,19 +742,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..fec710e411 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4068,11 +4076,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
+   ->  Result Cache
          Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4109,15 +4120,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
-         ->  Limit
+         ->  Result Cache
                Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
          Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+         Cache Key: (i8.q1), t2.f1
+         ->  Limit
+               Output: ((i8.q1)), (t2.f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: (i8.q1), t2.f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4163,14 +4180,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4808,34 +4828,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4890,14 +4916,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 50d2a7e4b9..bab3b6401b 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2107,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2143,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2179,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2216,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..14e163a06f
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 5339  Misses: 4661  Evictions: 4056  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=4661)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=4661)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.hundred = t1.hundred)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Gather (actual rows=1000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+         Recheck Cond: (unique1 < 1000)
+         Heap Blocks: exact=333
+         ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+               Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.hundred
+           Hits: 900  Misses: 100  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=100)
+                 ->  Index Only Scan using tenk1_hundred on tenk1 t2 (actual rows=100 loops=100)
+                       Index Cond: (hundred = t1.hundred)
+                       Heap Fetches: 0
+(16 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+RESET enable_hashjoin;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..0d2b3c5c10 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1167,9 +1171,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index b81923f2e7..baf778d95c 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -921,19 +921,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
@@ -1044,19 +1050,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..1bd175d992 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -87,10 +87,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1e904a8c5b..5ca0bcf238 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..52f614bdd4
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,54 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.hundred = t1.hundred)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+RESET enable_hashjoin;
-- 
2.25.1

#27

[1]: /messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#25)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 18 Aug 2020 at 21:42, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 11 Aug 2020 at 17:44, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-08-11 17:23:42 +1200, David Rowley wrote:

On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de> wrote:

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced it's a good idea to introduce a separate executor node
for this. There's a fair bit of overhead in them, and they will only be
below certain types of nodes afaict. It seems like it'd be better to
pull the required calls into the nodes that do parametrized scans of
subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to every
single executor node. That doesn't determine how many nodes would need
to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

hmm, I think you're right there about those two node types. I'm just
not sure you're right about overloading these node types to act as a
cache.

I'm not 100% either, to be clear. I am just acutely aware that adding
entire nodes is pretty expensive, and that there's, afaict, no need to
have arbitrary (i.e. pointer to function) type callbacks to point to the
cache.

Perhaps you're right, but I'm just not convinced of it. I feel
there's a certain air of magic involved in any node that has a good
name and reputation for doing one thing that we suddenly add new
functionality to which causes it to perform massively differently.

[ my long babble removed]

I'm wondering if anyone else has any thoughts on this?

Just for anyone following along at home. The two variations would
roughly look like:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual
rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(8 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0
Overflows: 0
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(6 rows)

and for subplans:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(9 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(7 rows)

I've spoken to one other person off-list about this and they suggested
that they prefer Andres' suggestion on performance grounds that it's
less overhead to pull tuples through the plan and cheaper executor
startup/shutdowns due to fewer nodes.

I don't object to making the change. I just object to making it only
to put it back again later when someone else speaks up that they'd
prefer to keep nodes modular and not overload them in obscure ways.

So other input is welcome. Is it too weird to overload SubPlan and
Nested Loop this way? Or okay to do that if it squeezes out a dozen
or so nanoseconds per tuple?

I did some analysis into the overhead of pulling tuples through an
additional executor node in [1]/messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com.

David

#28

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: David Rowley (#27)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

st 19. 8. 2020 v 5:48 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

On Tue, 18 Aug 2020 at 21:42, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 11 Aug 2020 at 17:44, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-08-11 17:23:42 +1200, David Rowley wrote:

On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de>

wrote:

On 2020-07-09 10:25:14 +1200, David Rowley wrote:

On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de>

wrote:

I'm not convinced it's a good idea to introduce a separate

executor node

for this. There's a fair bit of overhead in them, and they

will only be

below certain types of nodes afaict. It seems like it'd be

better to

pull the required calls into the nodes that do parametrized

scans of

subsidiary nodes. Have you considered that?

I see 41 different node types mentioned in ExecReScan(). I don't
really think it would be reasonable to change all those.

But that's because we dispatch ExecReScan mechanically down to

every

single executor node. That doesn't determine how many nodes would

need

to modify to include explicit caching? What am I missing?

Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
integration?

hmm, I think you're right there about those two node types. I'm just
not sure you're right about overloading these node types to act as a
cache.

I'm not 100% either, to be clear. I am just acutely aware that adding
entire nodes is pretty expensive, and that there's, afaict, no need to
have arbitrary (i.e. pointer to function) type callbacks to point to

the

cache.

Perhaps you're right, but I'm just not convinced of it. I feel
there's a certain air of magic involved in any node that has a good
name and reputation for doing one thing that we suddenly add new
functionality to which causes it to perform massively differently.

[ my long babble removed]

I'm wondering if anyone else has any thoughts on this?

Just for anyone following along at home. The two variations would
roughly look like:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN

---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual
rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(8 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN

---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0
Overflows: 0
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1
loops=20)
Index Cond: (unique1 = t1.twenty)
(6 rows)

and for subplans:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(9 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows:
0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(7 rows)

I've spoken to one other person off-list about this and they suggested
that they prefer Andres' suggestion on performance grounds that it's
less overhead to pull tuples through the plan and cheaper executor
startup/shutdowns due to fewer nodes.

I didn't do performance tests, that should be necessary, but I think
Andres' variant is a little bit more readable.

The performance is most important, but readability of EXPLAIN is
interesting too.

Regards

Pavel

Show quoted text

I don't object to making the change. I just object to making it only
to put it back again later when someone else speaks up that they'd
prefer to keep nodes modular and not overload them in obscure ways.

So other input is welcome. Is it too weird to overload SubPlan and
Nested Loop this way? Or okay to do that if it squeezes out a dozen
or so nanoseconds per tuple?

I did some analysis into the overhead of pulling tuples through an
additional executor node in [1].

David

[1]
/messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com

#29

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: David Rowley (#27)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

David Rowley <dgrowleyml@gmail.com> writes:

I don't object to making the change. I just object to making it only
to put it back again later when someone else speaks up that they'd
prefer to keep nodes modular and not overload them in obscure ways.

So other input is welcome. Is it too weird to overload SubPlan and
Nested Loop this way? Or okay to do that if it squeezes out a dozen
or so nanoseconds per tuple?

If you need somebody to blame it on, blame it on me - but I agree
that that is an absolutely horrid abuse of NestLoop. We might as
well reduce explain.c to a one-liner that prints "Here Be Dragons",
because no one will understand what this display is telling them.

I'm also quite skeptical that adding overhead to nodeNestloop.c
to support this would actually be a net win once you account for
what happens in plans where the caching is of no value.

regards, tom lane

#30

dgrowleyml@gmail.com

over 5 years ago

In reply to: Tom Lane (#29)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 19 Aug 2020 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I don't object to making the change. I just object to making it only
to put it back again later when someone else speaks up that they'd
prefer to keep nodes modular and not overload them in obscure ways.

So other input is welcome. Is it too weird to overload SubPlan and
Nested Loop this way? Or okay to do that if it squeezes out a dozen
or so nanoseconds per tuple?

If you need somebody to blame it on, blame it on me - but I agree
that that is an absolutely horrid abuse of NestLoop. We might as
well reduce explain.c to a one-liner that prints "Here Be Dragons",
because no one will understand what this display is telling them.

Thanks for chiming in. I'm relieved it's not me vs everyone else anymore.

I'm also quite skeptical that adding overhead to nodeNestloop.c
to support this would actually be a net win once you account for
what happens in plans where the caching is of no value.

Agreed.

David

#31

dgrowleyml@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#28)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 19 Aug 2020 at 16:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:

st 19. 8. 2020 v 5:48 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(9 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(7 rows)

I didn't do performance tests, that should be necessary, but I think Andres' variant is a little bit more readable.

Thanks for chiming in on this. I was just wondering about the
readability part and what makes the one with the Result Cache node
less readable? I can think of a couple of reasons you might have this
view and just wanted to double-check what it is.

David

#32

Alvaro Herrera

alvherre@2ndquadrant.com

over 5 years ago

In reply to: David Rowley (#27)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On 2020-Aug-19, David Rowley wrote:

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(6 rows)

I think it doesn't look terrible in the SubPlan case -- it kinda makes
sense there -- but for nested loop it appears really strange.

On the performance aspect, I wonder what the overhead is, particularly
considering Tom's point of making these nodes more expensive for cases
with no caching. And also, as the JIT saga continues, aren't we going
to get plan trees recompiled too, at which point it won't matter much?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33

dgrowleyml@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#32)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 20 Aug 2020 at 10:58, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On the performance aspect, I wonder what the overhead is, particularly
considering Tom's point of making these nodes more expensive for cases
with no caching.

It's likely small. I've not written any code but only thought about it
and I think it would be something like if (node->tuplecache != NULL).
I imagine that in simple cases the branch predictor would likely
realise the likely prediction fairly quickly and predict with 100%
accuracy, once learned. But it's perhaps possible that some other
branch shares the same slot in the branch predictor and causes some
conflicting predictions. The size of the branch predictor cache is
limited, of course. Certainly introducing new branches that
mispredict and cause a pipeline stall during execution would be a very
bad thing for performance. I'm unsure what would happen if there's
say, 2 Nested loops, one with caching = on and one with caching = off
where the number of tuples between the two is highly variable. I'm
not sure a branch predictor would handle that well given that the two
branches will be at the same address but have different predictions.
However, if the predictor was to hash in the stack pointer too, then
that might not be a problem. Perhaps someone with a better
understanding of modern branch predictors can share their insight
there.

And also, as the JIT saga continues, aren't we going
to get plan trees recompiled too, at which point it won't matter much?

I was thinking batch execution would be our solution to the node
overhead problem. We'll get there one day... we just need to finish
with the infinite other optimisations there are to do first.

David

#34

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: David Rowley (#31)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

čt 20. 8. 2020 v 0:04 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

On Wed, 19 Aug 2020 at 16:18, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

st 19. 8. 2020 v 5:48 odesílatel David Rowley <dgrowleyml@gmail.com>

napsal:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
-> Result Cache (actual rows=1 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(9 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
SubPlan 1
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0

Overflows: 0

-> Aggregate (actual rows=1 loops=20)
-> Seq Scan on tenk1 t2 (actual rows=500 loops=20)
Filter: (t1.twenty = twenty)
Rows Removed by Filter: 9500
(7 rows)

I didn't do performance tests, that should be necessary, but I think

Andres' variant is a little bit more readable.

Thanks for chiming in on this. I was just wondering about the
readability part and what makes the one with the Result Cache node
less readable? I can think of a couple of reasons you might have this
view and just wanted to double-check what it is.

It is more compact - less rows, less nesting levels

Show quoted text

David

#35

andres@anarazel.de

over 5 years ago

In reply to: Alvaro Herrera (#32)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote:

On 2020-Aug-19, David Rowley wrote:

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(6 rows)

I think it doesn't look terrible in the SubPlan case -- it kinda makes
sense there -- but for nested loop it appears really strange.

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

On the performance aspect, I wonder what the overhead is, particularly
considering Tom's point of making these nodes more expensive for cases
with no caching.

I doubt it, due to being a well predictable branch. But it's also easy
enough to just have a different Exec* function for the caching and
non-caching case, should that turn out to be a problem.

And also, as the JIT saga continues, aren't we going to get plan trees
recompiled too, at which point it won't matter much?

That's a fair bit out, I think. And even then it'll only help for
queries that run long enough (eventually also often enough, if we get
prepared statement JITing) to be worth JITing.

Greetings,

Andres Freund

#36

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#35)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this. I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.
Then after a few iterations of that, we end up with with a single node
in EXPLAIN that nobody can figure out what it does. "Here Be Dragons",
as Tom said. That might seem like a bit of an exaggeration, but it is
important to understand that this would start us down that path, and
the more steps you take down that path, the harder it is to return
from it.

Let's look at nodeProjectSet.c, for example, which I recall you spent
quite a bit of time painfully extracting the scattered logic to get it
into a single reusable node (69f4b9c85). I understand your motivation
was for JIT compilation and not to modularise the code, however, I
think the byproduct of that change of having all that code in one
executor node was a good change, and I'm certainly a fan of what it
allowed you to achieve with JIT. I really wouldn't like to put anyone
else in a position of having to extract out some complex logic that we
add to existing nodes in some future version of PostgreSQL. It might
seem quite innocent today, but add a few more years of development and
I'm sure things will get buried a little deeper.

I'm sure you know better than most that the day will come where we go
and start rewriting all of our executor node code to implement
something like batch execution. I'd imagine you'd agree that this job
would be easier if nodes were single-purpose, rather than overloaded
with a bunch of needless complexity that only Heath Robinson himself
could be proud of.

I find it bizarre that on one hand, for non-parameterized nested
loops, we can have the inner scan become materialized with a
Materialize node (I don't recall complaints about that) However, on
the other hand, for parameterized nested loops, we build the caching
into the Nested Loop node itself.

For the other arguments: I'm also struggling a bit to understand the
arguments that it makes EXPLAIN easier to read due to reduced nesting
depth. If that's the case, why don't we get rid of Hash below a Hash
Join? It seems nobody has felt strongly enough about that to go to the
trouble of writing the patch. We could certainly do work to reduce
nesting depth in EXPLAIN provided you're not big on what the plan
actually does. One line should be ok if you're not worried about
what's happening to your tuples. Unfortunately, that does not seem
very useful as it tends to be that people who do look at EXPLAIN do
actually want to know what the planner has decided to do and are
interested in what's happening to their tuples. Hiding away details
that can significantly impact the performance of the plan does not
seem like a great direction to be moving in.

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node. However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node. The costing, the way it
currently is designed will only choose to use a Result Cache if it
thinks it'll be cheaper to do so and cheaper means having enough cache
hits for the caching overhead to be worthwhile. If we get a good
cache hit ratio then the additional node overhead does not exist
during execution since we don't look any further than the cache during
a cache hit. It would only be a cache miss that requires pulling the
tuples through an additional node. Given perfect statistics (which of
course is impossible) and costs, we'll never slow down the execution
of a plan by having a separate Result Cache node. In reality, poor
statistics, e.g, massive n_distinct underestimations, could cause
slowdowns, but loading this into one node is not going to save us from
that. All that your design will save us from is that 12 nanosecond
per-tuple hop (measured on a 5-year-old laptop) to an additional node
during cache misses. It seems like a strange thing to optimise for,
given that the planner only chooses to use a Result Cache when there's
a good number of expected cache hits.

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node? Will anyone else?

David

#37

Gavin Flower

GavinFlower@archidevsys.co.nz

over 5 years ago

In reply to: David Rowley (#36)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On 25/08/2020 20:48, David Rowley wrote:

On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this. I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.
Then after a few iterations of that, we end up with with a single node
in EXPLAIN that nobody can figure out what it does. "Here Be Dragons",
as Tom said. That might seem like a bit of an exaggeration, but it is
important to understand that this would start us down that path, and
the more steps you take down that path, the harder it is to return
from it.

[...]

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node? Will anyone else?

David

From my own experience, and thinking about issues like this, I my
thinking keeping them separate adds robustness wrt change. Presumably
common code can be extracted out, to avoid excessive code duplication?

-- Gavin

#38

andres@anarazel.de

over 5 years ago

In reply to: David Rowley (#36)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this. I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node? Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Greetings,

Andres Freund

#39

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andres Freund (#38)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:

While I'm against introducing a separate node for the caching, I'm

*not*

against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a

'Cached

Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this. I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

If this is a key blocking factor for this topic, I'd like to do a simple
hack
to put the cache function into the subplan node, then do some tests to
show the real difference. But it is better to decide how much difference
can be thought of as a big difference. And for education purposes,
I'd like to understand where these differences come from. For my
current knowledge, my basic idea is it saves some function calls?

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node? Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Nice to see the different opinions from two great guys and interesting to
see how this can be resolved at last:)

--
Best Regards
Andy Fan

#40

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andres Freund (#38)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:

While I'm against introducing a separate node for the caching, I'm

*not*

against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a

'Cached

Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this. I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

Sorry that I missed this when I replied to the last thread. I understand
this, I remain neutral about this.

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node? Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Greetings,

Andres Freund

--
Best Regards
Andy Fan

#41

[1]: /messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#39)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 26 Aug 2020 at 05:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de> wrote:

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

If this is a key blocking factor for this topic, I'd like to do a simple hack
to put the cache function into the subplan node, then do some tests to
show the real difference. But it is better to decide how much difference
can be thought of as a big difference. And for education purposes,
I'd like to understand where these differences come from. For my
current knowledge, my basic idea is it saves some function calls?

If testing this, the cache hit ratio will be pretty key to the
results. You'd notice the overhead much less with a larger cache hit
ratio since you're not pulling the tuple from as deeply a nested node.
I'm unsure how you'd determine what is a good cache hit ratio to
test it with. The lower the cache expected cache hit ratio, the higher
the cost of the Result Cache node will be, so the planner has less
chance of choosing to use it. Maybe some experiments will find a
case where the planner picks a Result Cache plan with a low hit ratio
can be tested.

Say you find a case with the hit ratio of 90%. Going by [1]/messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com I found
pulling a tuple through an additional node to cost about 12
nanoseconds on an intel 4712HQ CPU. With a hit ratio of 90% we'll
only pull 10% of tuples through the additional node, so that's about
1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
might become hard to measure above the noise. More costly inner scans
will have the planner choose to Result Cache with lower estimated hit
ratios, but in that case, pulling the tuple through the additional
node during a cache miss will be less noticeable due to the more
costly inner side of the join.

Likely you could test the overhead only in theory without going to the
trouble of adapting the code to make SubPlan and Nested Loop do the
caching internally. If you just modify ExecResultCache() to have it
simply return its subnode, then measure the performance with and
without enable_resultcache, you should get an idea of the per-tuple
overhead of pulling the tuple through the additional node on your CPU.
After you know that number, you could put the code back to what the
patches have and then experiment with a number of cases to find a case
that chooses Result Cache and gets a low hit ratio.

For example, from the plan I used in the initial email on this thread:

-> Index Only Scan using lookup_a_idx on lookup l
(actual time=0.002..0.011 rows=100 loops=1000)
Index Cond: (a = hk.thousand)
Heap Fetches: 0
Planning Time: 0.113 ms
Execution Time: 1876.741 ms

I don't have the exact per tuple overhead on the machine I ran that
on, but it's an AMD 3990x CPU, so I'll guess the overhead is about 8
nanoseconds per tuple, given I found it to be 12 nanoseconds on a 2014
CPU If that's right, then the overhead is something like 8 * 100
(rows) * 1000 (loops) = 800000 nanoseconds = 0.8 milliseconds. If I
compare that to the execution time of the query, it's about 0.04%.

I imagine we'll need to find something with a much worse hit ratio so
we can actually measure the overhead.

David

#42

[1]: /messages/by-id/1992952.1592785225@sss.pgh.pa.us
/messages/by-id/1992952.1592785225@sss.pgh.pa.us
[2]: /messages/by-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA@mail.gmail.com
/messages/by-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA@mail.gmail.com

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#41)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, Aug 26, 2020 at 8:14 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 26 Aug 2020 at 05:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de>

wrote:

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

If this is a key blocking factor for this topic, I'd like to do a simple

hack

to put the cache function into the subplan node, then do some tests to
show the real difference. But it is better to decide how much difference
can be thought of as a big difference. And for education purposes,
I'd like to understand where these differences come from. For my
current knowledge, my basic idea is it saves some function calls?

If testing this, the cache hit ratio will be pretty key to the
results. You'd notice the overhead much less with a larger cache hit
ratio since you're not pulling the tuple from as deeply a nested node.
I'm unsure how you'd determine what is a good cache hit ratio to
test it with.

I wanted to test the worst case where the cache hit ratio is 0. and then
compare the difference between putting the cache as a dedicated
node and in a SubPlan node. However, we have a better way
to test the difference based on your below message.

The lower the cache expected cache hit ratio, the higher

the cost of the Result Cache node will be, so the planner has less
chance of choosing to use it.

IIRC, we add the ResultCache for subplan nodes unconditionally now.
The main reason is we lack of ndistinct estimation during the subquery
planning. Tom suggested converting the AlternativeSubPlan to SubPlan
in setrefs.c [1]/messages/by-id/1992952.1592785225@sss.pgh.pa.us, and I also ran into a case that can be resolved if we do
such conversion even earlier[2]/messages/by-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA@mail.gmail.com, the basic idea is we can do such
conversation
once we can get the actual values for the subplan.

something like
if (bms_is_subset(subplan->deps_relids, rel->relids)
{
convert_alternativesubplans_to_subplan(rel);
}
you can see if that can be helpful for ResultCache in this user case. my
patch in [2]/messages/by-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA@mail.gmail.com is still in a very PoC stage so it only takes care of subplan
in
rel->reltarget.

Say you find a case with the hit ratio of 90%. Going by [1] I found
pulling a tuple through an additional node to cost about 12
nanoseconds on an intel 4712HQ CPU. With a hit ratio of 90% we'll
only pull 10% of tuples through the additional node, so that's about
1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
might become hard to measure above the noise. More costly inner scans
will have the planner choose to Result Cache with lower estimated hit
ratios, but in that case, pulling the tuple through the additional
node during a cache miss will be less noticeable due to the more
costly inner side of the join.

Likely you could test the overhead only in theory without going to the
trouble of adapting the code to make SubPlan and Nested Loop do the
caching internally. If you just modify ExecResultCache() to have it
simply return its subnode, then measure the performance with and
without enable_resultcache, you should get an idea of the per-tuple
overhead of pulling the tuple through the additional node on your CPU.

Thanks for the hints. I think we can test it even easier with Limit node.

create table test_pull_tuples(a int);
insert into test_pull_tuples select i from generate_seri
insert into test_pull_tuples select i from generate_series(1, 100000)i;
-- test with pgbench.
select * from test_pull_tuples; 18.850 ms
select * from test_pull_tuples limit 100000; 20.500 ms

Basically it is 16 nanoseconds per tuple on my Intel(R) Xeon(R) CPU
E5-2650.
Personally I'd say the performance difference is negligible unless I see
some
different numbers.

--
Best Regards
Andy Fan

#43

[1]: /messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andres Freund (#38)

1 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 26 Aug 2020 at 03:52, Andres Freund <andres@anarazel.de> wrote:

On 2020-08-25 20:48:37 +1200, David Rowley wrote:

However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

So I did a very rough-cut change to the patch to have the caching be
part of Nested Loop. It can be applied on top of the other 3 v7
patches.

For the performance, the test I did results in the performance
actually being reduced from having the Result Cache as a separate
node. The reason for this is mostly because Nested Loop projects.
Each time I fetch a MinimalTuple from the cache, the patch will deform
it in order to store it in the virtual inner tuple slot for the nested
loop. Having the Result Cache as a separate node can skip this step as
it's result tuple slot is a TTSOpsMinimalTuple, so we can just store
the cached MinimalTuple right into the slot without any
deforming/copying.

Here's an example of a query that's now slower:

select count(*) from hundredk hk inner join lookup100 l on hk.one = l.a;

In this case, the original patch does not have to deform the
MinimalTuple from the cache as the count(*) does not require any Vars
from it. With the rough patch that's attached, the MinimalTuple is
deformed in during the transformation during ExecCopySlot(). The
slowdown exists no matter which column of the hundredk table I join to
(schema in [1]/messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com).

Performance comparison is as follows:

v7 (Result Cache as a separate node)
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
Execution Time: 652.582 ms

v7 + attached rough patch
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
Execution Time: 843.566 ms

I've not yet thought of any way to get rid of the needless
MinimalTuple deform. I suppose the cache could just have already
deformed tuples, but that requires more memory and would result in a
worse cache hit ratios for workloads where the cache gets filled.

I'm open to ideas to make the comparison fairer.

(Renamed the patch file to .txt to stop the CFbot getting upset with me)

David

Attachments:

resultcache_in_nestloop_hacks.patch.txttext/plain; charset=US-ASCII; name=resultcache_in_nestloop_hacks.patch.txtDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6d4b9eb3b9..42c6df549f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,8 +108,7 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
-static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
-								  ExplainState *es);
+static void show_resultcache_info(NestLoopState *nlstate, List *ancestors, ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1494,10 +1493,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					 * For historical reasons, the join type is interpolated
 					 * into the node type name...
 					 */
-					if (((Join *) plan)->jointype != JOIN_INNER)
+					if (((Join *)plan)->jointype != JOIN_INNER)
 						appendStringInfo(es->str, " %s Join", jointype);
 					else if (!IsA(plan, NestLoop))
 						appendStringInfoString(es->str, " Join");
+					else if (castNode(NestLoop, plan)->paramcache)
+						appendStringInfoString(es->str, " Cached");
+
 				}
 				else
 					ExplainPropertyText("Join Type", jointype, es);
@@ -1883,6 +1885,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			break;
 		case T_NestLoop:
+			show_resultcache_info((NestLoopState *) planstate, ancestors, es);
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
 			if (((NestLoop *) plan)->join.joinqual)
@@ -1963,10 +1966,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
-		case T_ResultCache:
-			show_resultcache_info(castNode(ResultCacheState, planstate),
-								  ancestors, es);
-			break;
+		//case T_ResultCache:
+		//	show_resultcache_info(castNode(ResultCacheState, planstate),
+		//						  ancestors, es);
+		//	break;
 		default:
 			break;
 	}
@@ -3041,15 +3044,19 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 }
 
 static void
-show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+show_resultcache_info(NestLoopState *nlstate, List *ancestors, ExplainState *es)
 {
-	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	Plan	   *plan = ((PlanState *) nlstate)->plan;
+	ResultCacheState *rcstate;
 	ListCell   *lc;
 	List	   *context;
 	StringInfoData keystr;
 	char	   *seperator = "";
 	bool		useprefix;
 
+	if (nlstate->nl_pcache == NULL)
+		return;
+
 	initStringInfo(&keystr);
 
 	/* XXX surely we'll always have more than one if we have a resultcache? */
@@ -3060,7 +3067,7 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *
 									   plan,
 									   ancestors);
 
-	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	foreach(lc, ((NestLoop *) plan)->param_exprs)
 	{
 		Node	   *expr = (Node *) lfirst(lc);
 
@@ -3086,6 +3093,8 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *
 	if (!es->analyze)
 		return;
 
+
+	rcstate = nlstate->nl_pcache;
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 68920ecd89..f9c2f80c79 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,7 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -250,9 +250,9 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
-		case T_ResultCacheState:
-			ExecReScanResultCache((ResultCacheState *) node);
-			break;
+		//case T_ResultCacheState:
+		//	ExecReScanResultCache((ResultCacheState *) node);
+		//	break;
 
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 459e9dd3e9..37cfa36881 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,7 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -294,10 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+		//	break;
 		default:
 			break;
 	}
@@ -518,10 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+		//	break;
 		default:
 			break;
 	}
@@ -998,9 +998,9 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
-		case T_ResultCacheState:
-			/* these nodes have DSM state, but no reinitialization is required */
-			break;
+		//case T_ResultCacheState:
+		//	/* these nodes have DSM state, but no reinitialization is required */
+		//	break;
 
 		default:
 			break;
@@ -1068,9 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
-		case T_ResultCacheState:
-			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
-			break;
+		//case T_ResultCacheState:
+		//	ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+		//	break;
 		default:
 			break;
 	}
@@ -1363,11 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
-											pwcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+		//									pwcxt);
+		//	break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index fbbe667cc1..e5b8c74da7 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,7 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -320,10 +320,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
-		case T_ResultCache:
-			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
-													   estate, eflags);
-			break;
+		//case T_ResultCache:
+		//	result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+		//											   estate, eflags);
+		//	break;
 
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
@@ -709,9 +709,9 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
-		case T_ResultCacheState:
-			ExecEndResultCache((ResultCacheState *) node);
-			break;
+		//case T_ResultCacheState:
+		//	ExecEndResultCache((ResultCacheState *) node);
+		//	break;
 
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c2996d4..97213071d5 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -23,9 +23,21 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeNestloop.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
 
+static inline TupleTableSlot *
+FetchInnerTuple(ResultCacheState *rcstate, PlanState *innerPlan)
+{
+	/* No caching? Just exec the inner node */
+	if (rcstate == NULL)
+		return ExecProcNode(innerPlan);
+	/* Otherwise let the cache deal with it */
+	else
+		return ExecResultCache(rcstate, innerPlan);
+}
+
 
 /* ----------------------------------------------------------------
  *		ExecNestLoop(node)
@@ -150,6 +162,11 @@ ExecNestLoop(PlanState *pstate)
 			 */
 			ENL1_printf("rescanning inner plan");
 			ExecReScan(innerPlan);
+
+			/* When using a result cache, reset the state ready for another lookup */
+			if (node->nl_pcache)
+				ExecResultCacheFinishScan(node->nl_pcache);
+
 		}
 
 		/*
@@ -157,7 +174,7 @@ ExecNestLoop(PlanState *pstate)
 		 */
 		ENL1_printf("getting new inner tuple");
 
-		innerTupleSlot = ExecProcNode(innerPlan);
+		innerTupleSlot = FetchInnerTuple(node->nl_pcache, innerPlan);
 		econtext->ecxt_innertuple = innerTupleSlot;
 
 		if (TupIsNull(innerTupleSlot))
@@ -345,6 +362,13 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 */
 	nlstate->nl_NeedNewOuter = true;
 	nlstate->nl_MatchedOuter = false;
+	nlstate->nl_ParamCache = node->paramcache;
+
+	/* Setup the result cache if enabled */
+	if (nlstate->nl_ParamCache)
+		nlstate->nl_pcache = ExecInitResultCache(node, (PlanState *) nlstate, (PlanState *) innerPlanState(nlstate));
+	else
+		nlstate->nl_pcache = NULL;
 
 	NL1_printf("ExecInitNestLoop: %s\n",
 			   "node initialized");
@@ -352,6 +376,7 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	return nlstate;
 }
 
+
 /* ----------------------------------------------------------------
  *		ExecEndNestLoop
  *
@@ -380,6 +405,9 @@ ExecEndNestLoop(NestLoopState *node)
 	ExecEndNode(outerPlanState(node));
 	ExecEndNode(innerPlanState(node));
 
+	if (node->nl_pcache)
+		ExecEndResultCache(node->nl_pcache);
+
 	NL1_printf("ExecEndNestLoop: %s\n",
 			   "node processing ended");
 }
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 09b25ea184..da5edf9c06 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -66,7 +66,6 @@
 										 * subplan without caching anything */
 #define RC_END_OF_SCAN				5	/* Ready for rescan */
 
-
 /* Helper macros for memory accounting */
 #define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
 										 sizeof(ResultCacheKey) + \
@@ -179,7 +178,7 @@ ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
 					  const ResultCacheKey *key2)
 {
 	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
-	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	ExprContext *econtext = rcstate->ps_ExprContext;
 	TupleTableSlot *tslot = rcstate->tableslot;
 	TupleTableSlot *pslot = rcstate->probeslot;
 
@@ -223,7 +222,7 @@ prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
 		/* Set the probeslot's values based on the current parameter values */
 		for (int i = 0; i < numKeys; i++)
 			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
-												rcstate->ss.ps.ps_ExprContext,
+												rcstate->ps_ExprContext,
 												&pslot->tts_isnull[i]);
 	}
 	else
@@ -243,7 +242,7 @@ prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
  *		Remove all tuples from a cache entry, leaving an empty cache entry.
  *		Also update memory accounting to reflect the removal of the tuples.
  */
-static inline void
+static void
 entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
 {
 	ResultCacheTuple *tuple = entry->tuplehead;
@@ -590,21 +589,32 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	return true;
 }
 
-static TupleTableSlot *
-ExecResultCache(PlanState *pstate)
+/*
+ * Caller to call this after it finishes a parameterized scan
+ */
+void
+ExecResultCacheFinishScan(ResultCacheState *rcstate)
+{
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	rcstate->entry = NULL;
+	rcstate->last_tuple = NULL;
+}
+
+TupleTableSlot *
+ExecResultCache(ResultCacheState *rcstate, PlanState *innerPlan)
 {
-	ResultCacheState *node = castNode(ResultCacheState, pstate);
-	PlanState  *outerNode;
 	TupleTableSlot *slot;
 
-	switch (node->rc_status)
+	switch (rcstate->rc_status)
 	{
 		case RC_CACHE_LOOKUP:
 			{
 				ResultCacheEntry *entry;
 				bool		found;
 
-				Assert(node->entry == NULL);
+				Assert(rcstate->entry == NULL);
 
 				/*
 				 * We're only ever in this state for the first call of the
@@ -619,44 +629,43 @@ ExecResultCache(PlanState *pstate)
 				 * one there, we'll try to cache it.
 				 */
 
-				/* see if we've got anything cached for the current parameters */
-				entry = cache_lookup(node, &found);
+				 /* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(rcstate, &found);
 
 				if (found && entry->complete)
 				{
-					node->stats.cache_hits += 1;	/* stats update */
+					rcstate->stats.cache_hits += 1;	/* stats update */
 
 					/*
 					 * Set last_tuple and entry so that the state
 					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
 					 * tuple for these parameters.
 					 */
-					node->last_tuple = entry->tuplehead;
-					node->entry = entry;
+					rcstate->last_tuple = entry->tuplehead;
+					rcstate->entry = entry;
 
 					/* Fetch the first cached tuple, if there is one */
 					if (entry->tuplehead)
 					{
-						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+						rcstate->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
 
-						slot = node->ss.ps.ps_ResultTupleSlot;
-						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
-											  slot, false);
-
-						return slot;
+						ExecClearTuple(rcstate->cachefoundslot);
+						slot = rcstate->cachefoundslotmin;
+						ExecStoreMinimalTuple(rcstate->last_tuple->mintuple, slot, false);
+						return ExecCopySlot(rcstate->cachefoundslot, slot);
 					}
 					else
 					{
 						/* The cache entry is void of any tuples. */
-						node->rc_status = RC_END_OF_SCAN;
+						rcstate->rc_status = RC_END_OF_SCAN;
 						return NULL;
 					}
 				}
 				else
 				{
-					TupleTableSlot *outerslot;
+					TupleTableSlot *innerslot;
 
-					node->stats.cache_misses += 1;	/* stats update */
+					rcstate->stats.cache_misses += 1;	/* stats update */
 
 					if (found)
 					{
@@ -668,13 +677,12 @@ ExecResultCache(PlanState *pstate)
 						 * guarantee the outer node will produce the tuples in
 						 * the same order as it did last time.
 						 */
-						entry_purge_tuples(node, entry);
+						entry_purge_tuples(rcstate, entry);
 					}
 
 					/* Scan the outer node for a tuple to cache */
-					outerNode = outerPlanState(node);
-					outerslot = ExecProcNode(outerNode);
-					if (TupIsNull(outerslot))
+					innerslot = ExecProcNode(innerPlan);
+					if (TupIsNull(innerslot))
 					{
 						/*
 						 * cache_lookup may have returned NULL due to failure
@@ -686,22 +694,22 @@ ExecResultCache(PlanState *pstate)
 						if (likely(entry))
 							entry->complete = true;
 
-						node->rc_status = RC_END_OF_SCAN;
+						rcstate->rc_status = RC_END_OF_SCAN;
 						return NULL;
 					}
 
-					node->entry = entry;
+					rcstate->entry = entry;
 
 					/*
 					 * If we failed to create the entry or failed to store the
 					 * tuple in the entry, then go into bypass mode.
 					 */
 					if (unlikely(entry == NULL ||
-								 !cache_store_tuple(node, outerslot)))
+						!cache_store_tuple(rcstate, innerslot)))
 					{
-						node->stats.cache_overflows += 1;	/* stats update */
+						rcstate->stats.cache_overflows += 1;	/* stats update */
 
-						node->rc_status = RC_CACHE_BYPASS_MODE;
+						rcstate->rc_status = RC_CACHE_BYPASS_MODE;
 
 						/*
 						 * No need to clear out last_tuple as we'll stay in
@@ -716,43 +724,41 @@ ExecResultCache(PlanState *pstate)
 						 * allows cache lookups to work even when the scan has
 						 * not been executed to completion.
 						 */
-						entry->complete = node->singlerow;
-						node->rc_status = RC_FILLING_CACHE;
+						entry->complete = rcstate->singlerow;
+						rcstate->rc_status = RC_FILLING_CACHE;
 					}
 
-					slot = node->ss.ps.ps_ResultTupleSlot;
-					ExecCopySlot(slot, outerslot);
-					return slot;
+					return innerslot;
 				}
 			}
 
 		case RC_CACHE_FETCH_NEXT_TUPLE:
 			{
 				/* We shouldn't be in this state if these are not set */
-				Assert(node->entry != NULL);
-				Assert(node->last_tuple != NULL);
+				Assert(rcstate->entry != NULL);
+				Assert(rcstate->last_tuple != NULL);
 
 				/* Skip to the next tuple to output */
-				node->last_tuple = node->last_tuple->next;
+				rcstate->last_tuple = rcstate->last_tuple->next;
 
 				/* No more tuples in the cache */
-				if (node->last_tuple == NULL)
+				if (rcstate->last_tuple == NULL)
 				{
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 
-				slot = node->ss.ps.ps_ResultTupleSlot;
-				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
-									  false);
+				ExecClearTuple(rcstate->cachefoundslot);
+				slot = rcstate->cachefoundslotmin;
+				ExecStoreMinimalTuple(rcstate->last_tuple->mintuple, slot, false);
 
-				return slot;
+				return ExecCopySlot(rcstate->cachefoundslot, slot);
 			}
 
 		case RC_FILLING_CACHE:
 			{
-				TupleTableSlot *outerslot;
-				ResultCacheEntry *entry = node->entry;
+				TupleTableSlot *innerslot;
+				ResultCacheEntry *entry = rcstate->entry;
 
 				/* entry should already have been set by RC_CACHE_LOOKUP */
 				Assert(entry != NULL);
@@ -762,13 +768,12 @@ ExecResultCache(PlanState *pstate)
 				 * miss and are populating the cache with the current scan
 				 * tuples.
 				 */
-				outerNode = outerPlanState(node);
-				outerslot = ExecProcNode(outerNode);
-				if (TupIsNull(outerslot))
+				innerslot = ExecProcNode(innerPlan);
+				if (TupIsNull(innerslot))
 				{
 					/* No more tuples.  Mark it as complete */
 					entry->complete = true;
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 				else
@@ -782,12 +787,12 @@ ExecResultCache(PlanState *pstate)
 						elog(ERROR, "cache entry already complete");
 
 					/* Record the tuple in the current cache entry */
-					if (unlikely(!cache_store_tuple(node, outerslot)))
+					if (unlikely(!cache_store_tuple(rcstate, innerslot)))
 					{
 						/* Couldn't store it?  Handle overflow */
-						node->stats.cache_overflows += 1;	/* stats update */
+						rcstate->stats.cache_overflows += 1;	/* stats update */
 
-						node->rc_status = RC_CACHE_BYPASS_MODE;
+						rcstate->rc_status = RC_CACHE_BYPASS_MODE;
 
 						/*
 						 * No need to clear out entry or last_tuple as we'll
@@ -795,32 +800,27 @@ ExecResultCache(PlanState *pstate)
 						 */
 					}
 
-					slot = node->ss.ps.ps_ResultTupleSlot;
-					ExecCopySlot(slot, outerslot);
-					return slot;
+					return innerslot;
 				}
 			}
 
 		case RC_CACHE_BYPASS_MODE:
 			{
-				TupleTableSlot *outerslot;
+				TupleTableSlot *innerslot;
 
 				/*
 				 * When in bypass mode we just continue to read tuples without
 				 * caching.  We need to wait until the next rescan before we
 				 * can come out of this mode.
 				 */
-				outerNode = outerPlanState(node);
-				outerslot = ExecProcNode(outerNode);
-				if (TupIsNull(outerslot))
+				innerslot = ExecProcNode(innerPlan);
+				if (TupIsNull(innerslot))
 				{
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 
-				slot = node->ss.ps.ps_ResultTupleSlot;
-				ExecCopySlot(slot, outerslot);
-				return slot;
+				return innerslot;
 			}
 
 		case RC_END_OF_SCAN:
@@ -833,60 +833,34 @@ ExecResultCache(PlanState *pstate)
 
 		default:
 			elog(ERROR, "unrecognized resultcache state: %d",
-				 (int) node->rc_status);
+				 (int) rcstate->rc_status);
 			return NULL;
 	}							/* switch */
 }
 
 ResultCacheState *
-ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+ExecInitResultCache(NestLoop *node, PlanState *planstate, PlanState *cache_planstate)
 {
 	ResultCacheState *rcstate = makeNode(ResultCacheState);
-	Plan	   *outerNode;
 	int			i;
 	int			nkeys;
 	Oid		   *eqfuncoids;
 
-	/* check for unsupported flags */
-	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
-
-	rcstate->ss.ps.plan = (Plan *) node;
-	rcstate->ss.ps.state = estate;
-	rcstate->ss.ps.ExecProcNode = ExecResultCache;
-
-	/*
-	 * Miscellaneous initialization
-	 *
-	 * create expression context for node
-	 */
-	ExecAssignExprContext(estate, &rcstate->ss.ps);
-
-	outerNode = outerPlan(node);
-	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
-
-	/*
-	 * Initialize return slot and type. No need to initialize projection info
-	 * because this node doesn't do projections.
-	 */
-	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
-	rcstate->ss.ps.ps_ProjInfo = NULL;
-
-	/*
-	 * Initialize scan slot and type.
-	 */
-	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
-
-	/*
-	 * Set the state machine to lookup the cache.  We won't find anything
-	 * until we cache something, but this saves a special case to create the
-	 * first entry.
-	 */
+	rcstate->ps_ExprContext = CreateExprContext(planstate->state);
 	rcstate->rc_status = RC_CACHE_LOOKUP;
 
 	rcstate->nkeys = nkeys = node->numKeys;
 	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
 	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
 												  &TTSOpsMinimalTuple);
+	/* XXX this should make a slot the same type as cache_planstates result slot.  For now
+	 * that'll always be a nested loop, so just make a virtual slot, which is what nested loop
+	 * uses.
+	 */
+	rcstate->cachefoundslot = MakeSingleTupleTableSlot(cache_planstate->ps_ResultTupleDesc,
+		&TTSOpsVirtual);
+	rcstate->cachefoundslotmin = MakeSingleTupleTableSlot(cache_planstate->ps_ResultTupleDesc,
+		&TTSOpsMinimalTuple);
 	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
 												  &TTSOpsVirtual);
 
@@ -910,7 +884,7 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 
 		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
 
-		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *)planstate);
 		eqfuncoids[i] = get_opcode(hashop);
 	}
 
@@ -919,7 +893,7 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 													eqfuncoids,
 													node->collations,
 													node->param_exprs,
-													(PlanState *) rcstate);
+													(PlanState *) planstate);
 
 	pfree(eqfuncoids);
 	rcstate->mem_used = 0;
@@ -970,57 +944,12 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 void
 ExecEndResultCache(ResultCacheState *node)
 {
-	/*
-	 * When ending a parallel worker, copy the statistics gathered by the
-	 * worker back into shared memory so that it can be picked up by the main
-	 * process to report in EXPLAIN ANALYZE.
-	 */
-	if (node->shared_info && IsParallelWorker())
-	{
-		ResultCacheInstrumentation *si;
-
-		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
-		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
-	}
-
 	/* Remove the cache context */
 	MemoryContextDelete(node->tableContext);
 
-	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-	/* must drop pointer to cache result tuple */
-	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
-	/*
-	 * free exprcontext
-	 */
-	ExecFreeExprContext(&node->ss.ps);
-
-	/*
-	 * shut down the subplan
-	 */
-	ExecEndNode(outerPlanState(node));
-}
-
-void
-ExecReScanResultCache(ResultCacheState *node)
-{
-	PlanState  *outerPlan = outerPlanState(node);
-
-	/* Mark that we must lookup the cache for a new set of parameters */
-	node->rc_status = RC_CACHE_LOOKUP;
-
-	/* nullify pointers used for the last scan */
-	node->entry = NULL;
-	node->last_tuple = NULL;
-
-	/*
-	 * if chgParam of subnode is not null then plan will be re-scanned by
-	 * first ExecProcNode.
-	 */
-	if (outerPlan->chgParam == NULL)
-		ExecReScan(outerPlan);
-
+	ExecClearTuple(node->cachefoundslot);
+	ExecClearTuple(node->cachefoundslotmin);
+	FreeExprContext(node->ps_ExprContext, false);
 }
 
 /*
@@ -1035,88 +964,3 @@ ExecEstimateCacheEntryOverheadBytes(double ntuples)
 		sizeof(ResultCacheTuple) * ntuples;
 }
 
-/* ----------------------------------------------------------------
- *						Parallel Query Support
- * ----------------------------------------------------------------
- */
-
- /* ----------------------------------------------------------------
-  *		ExecResultCacheEstimate
-  *
-  *		Estimate space required to propagate result cache statistics.
-  * ----------------------------------------------------------------
-  */
-void
-ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
-{
-	Size		size;
-
-	/* don't need this if not instrumenting or no workers */
-	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
-		return;
-
-	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
-	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
-	shm_toc_estimate_chunk(&pcxt->estimator, size);
-	shm_toc_estimate_keys(&pcxt->estimator, 1);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheInitializeDSM
- *
- *		Initialize DSM space for result cache statistics.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
-{
-	Size		size;
-
-	/* don't need this if not instrumenting or no workers */
-	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
-		return;
-
-	size = offsetof(SharedResultCacheInfo, sinstrument)
-		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
-	node->shared_info = shm_toc_allocate(pcxt->toc, size);
-	/* ensure any unfilled slots will contain zeroes */
-	memset(node->shared_info, 0, size);
-	node->shared_info->num_workers = pcxt->nworkers;
-	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
-				   node->shared_info);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheInitializeWorker
- *
- *		Attach worker to DSM space for result cache statistics.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
-{
-	node->shared_info =
-		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheRetrieveInstrumentation
- *
- *		Transfer result cache statistics from DSM to private memory.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
-{
-	Size		size;
-	SharedResultCacheInfo *si;
-
-	if (node->shared_info == NULL)
-		return;
-
-	size = offsetof(SharedResultCacheInfo, sinstrument)
-		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
-	si = palloc(size);
-	memcpy(si, node->shared_info, size);
-	node->shared_info = si;
-}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e50844df9b..0101d719c4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2298,148 +2298,6 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
-/*
- * cost_resultcache_rescan
- *	  Determines the estimated cost of rescanning a ResultCache node.
- *
- * In order to estimate this, we must gain knowledge of how often we expect to
- * be called and how many distinct sets of parameters we are likely to be
- * called with. If we expect a good cache hit ratio, then we can set our
- * costs to account for that hit ratio, plus a little bit of cost for the
- * caching itself.  Caching will not work out well if we expect to be called
- * with too many distinct parameter values.  The worst-case here is that we
- * never see the same parameter values twice, in which case we'd never get a
- * cache hit and caching would be a complete waste of effort.
- */
-static void
-cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
-						Cost *rescan_startup_cost, Cost *rescan_total_cost)
-{
-	Cost		input_startup_cost = rcpath->subpath->startup_cost;
-	Cost		input_total_cost = rcpath->subpath->total_cost;
-	double		tuples = rcpath->subpath->rows;
-	double		calls = rcpath->calls;
-	int			width = rcpath->subpath->pathtarget->width;
-	int			flags;
-
-	double		work_mem_bytes;
-	double		est_entry_bytes;
-	double		est_cache_entries;
-	double		ndistinct;
-	double		evict_ratio;
-	double		hit_ratio;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* available cache space */
-	work_mem_bytes = work_mem * 1024L;
-
-	/*
-	 * Set the number of bytes each cache entry should consume in the cache.
-	 * To provide us with better estimations on how many cache entries we can
-	 * store at once we make a call to the excutor here to ask it what memory
-	 * overheads there are for a single cache entry.
-	 *
-	 * XXX we also store the cache key, but that's not accounted for here.
-	 */
-	est_entry_bytes = relation_byte_size(tuples, width) +
-		ExecEstimateCacheEntryOverheadBytes(tuples);
-
-	/* estimate on the upper limit of cache entries we can hold at once */
-	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
-
-	/* estimate on the distinct number of parameter values */
-	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
-									&flags);
-
-	/*
-	 * When the estimation fell back on using a default value, it's a bit too
-	 * risky to assume that it's ok to use a Result Cache.  The use of a
-	 * default could cause us to use a Result Cache when it's really
-	 * inappropriate to do so.  If we see that this has been done then we'll
-	 * assume that every call will have unique parameters, which will almost
-	 * certainly mean a ResultCachePath will never survive add_path().
-	 */
-	if ((flags & SELFLAG_USED_DEFAULT) != 0)
-		ndistinct = calls;
-
-	/*
-	 * Since we've already estimated the maximum number of entries we can
-	 * store at once and know the estimated number of distinct values we'll be
-	 * called with, well take this opportunity to set the path's est_entries.
-	 * This will ultimately determine the hash table size that the executor
-	 * will use.  If we leave this at zero the executor will just choose the
-	 * size itself.  Really this is not the right place to do this, but it's
-	 * convenient since everything is already calculated.
-	 */
-	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
-							  PG_UINT32_MAX);
-
-
-	/*
-	 * When the number of distinct parameter values is above the amount we can
-	 * store in the cache, then we'll have to evict some entries from the
-	 * cache.  This is not free, so here we estimate how often we'll incur the
-	 * cost of that eviction.
-	 */
-	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
-
-	/*
-	 * In order to estimate how costly a single scan will be, we need to
-	 * attempt to estimate what the cache hit ratio will be.  To do that we
-	 * must look at how many scans are estimated in total of this node and how
-	 * many of those scans we expect to get a cache hit.
-	 */
-	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
-		(ndistinct / calls);
-
-	/* Ensure we don't go negative */
-	hit_ratio = Max(hit_ratio, 0);
-
-	/*
-	 * Set the total_cost accounting for the expected cache hit ratio.  We
-	 * also add on a cpu_operator_cost to account for a cache lookup. This
-	 * will happen regardless of if it's a cache hit or not.
-	 */
-	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
-
-	/* Now adjust the total cost to account for cache evictions */
-
-	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
-	total_cost += cpu_tuple_cost * evict_ratio;
-
-	/*
-	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
-	 * The per-tuple eviction is really just a pfree, so charging a whole
-	 * cpu_operator_cost seems a little excessive.
-	 */
-	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
-
-	/*
-	 * Now adjust for storing things in the cache, since that's not free
-	 * either.  Everything must go in the cache, so we don't proportion this
-	 * over any ratio, just apply it once for the scan.  We charge a
-	 * cpu_tuple_cost for the creation of the cache entry and also a
-	 * cpu_operator_cost for each tuple we expect to cache.
-	 */
-	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
-
-	/*
-	 * Getting the first row must be also be proportioned according to the
-	 * expected cache hit ratio.
-	 */
-	startup_cost = input_startup_cost * (1.0 - hit_ratio);
-
-	/*
-	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
-	 * which we'll do regardless of if it was a cache hit or not.
-	 */
-	startup_cost += cpu_tuple_cost;
-
-	*rescan_startup_cost = startup_cost;
-	*rescan_total_cost = total_cost;
-}
-
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4167,11 +4025,6 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
-		case T_ResultCache:
-			/* All the hard work is done by cost_resultcache_rescan */
-			cost_resultcache_rescan(root, (ResultCachePath *) path,
-									rescan_startup_cost, rescan_total_cost);
-			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index f4c76577ad..5918dd9a3a 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,13 +17,16 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "executor/nodeResultCache.h"
 #include "foreign/fdwapi.h"
+#include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
@@ -554,6 +557,152 @@ get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
 	return NULL;
 }
 
+static double
+relation_byte_size(double tuples, int width)
+{
+	return tuples * (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+}
+
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static bool
+use_nestedloop_cache(PlannerInfo *root, NestPath *nlpath)
+{
+	Cost		input_startup_cost = nlpath->innerjoinpath->startup_cost;
+	Cost		input_total_cost = nlpath->innerjoinpath->total_cost;
+	double		tuples = nlpath->innerjoinpath->rows;
+	double		calls = nlpath->outerjoinpath->rows;
+	int			width = nlpath->innerjoinpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = 1; // estimate_num_groups(root, nlpath->rcpath->param_exprs, calls, NULL,
+		//&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	//nlpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+	//	PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	return total_cost < nlpath->innerjoinpath->total_cost;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -576,8 +725,7 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
-	Path	   *inner_cache_path;
-	bool		added_path = false;
+	ResultCachePath	   *rcpath;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -628,6 +776,7 @@ try_nestloop_path(PlannerInfo *root,
 						  workspace.startup_cost, workspace.total_cost,
 						  pathkeys, required_outer))
 	{
+		NestPath *nlpath;
 		/*
 		 * If the inner path is parameterized, it is parameterized by the
 		 * topmost parent of the outer rel, not the outer rel itself.  Fix
@@ -649,103 +798,37 @@ try_nestloop_path(PlannerInfo *root,
 			}
 		}
 
-		add_path(joinrel, (Path *)
-				 create_nestloop_path(root,
-									  joinrel,
-									  jointype,
-									  &workspace,
-									  extra,
-									  outer_path,
-									  inner_path,
-									  extra->restrictlist,
-									  pathkeys,
-									  required_outer));
-		added_path = true;
-	}
-
-	/*
-	 * See if we can build a result cache path for this inner_path. That might
-	 * make the nested loop cheaper.
-	 */
-	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
-											inner_path, outer_path, jointype,
-											extra);
-
-	if (inner_cache_path == NULL)
-	{
-		if (!added_path)
-			bms_free(required_outer);
-		return;
-	}
-
-	initial_cost_nestloop(root, &workspace, jointype,
-						  outer_path, inner_cache_path, extra);
-
-	if (add_path_precheck(joinrel,
-						  workspace.startup_cost, workspace.total_cost,
-						  pathkeys, required_outer))
-	{
 		/*
-		 * If the inner path is parameterized, it is parameterized by the
-		 * topmost parent of the outer rel, not the outer rel itself.  Fix
-		 * that.
+		 * See if we can build a result cache path for this inner_path. That might
+		 * make the nested loop cheaper.
 		 */
-		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		rcpath = (ResultCachePath *) get_resultcache_path(root, innerrel, outerrel,
+			inner_path, outer_path, jointype,
+			extra);
+
+		nlpath = create_nestloop_path(root,
+			joinrel,
+			jointype,
+			&workspace,
+			extra,
+			outer_path,
+			inner_path,
+			extra->restrictlist,
+			pathkeys,
+			required_outer);
+
+		if (rcpath != NULL)
 		{
-			Path	   *reparameterize_path;
-
-			reparameterize_path = reparameterize_path_by_child(root,
-															   inner_cache_path,
-															   outer_path->parent);
-
-			/*
-			 * If we could not translate the path, we can't create nest loop
-			 * path.
-			 */
-			if (!reparameterize_path)
-			{
-				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
-
-				/* Waste no memory when we reject a path here */
-				list_free(rcpath->hash_operators);
-				list_free(rcpath->param_exprs);
-				pfree(rcpath);
-
-				if (!added_path)
-					bms_free(required_outer);
-				return;
-			}
+			nlpath->use_cache = true;
+			nlpath->hash_operators = rcpath->hash_operators;
+			nlpath->param_exprs = rcpath->param_exprs;
+			nlpath->singlerow = rcpath->singlerow;
+			nlpath->calls = rcpath->calls;
+			nlpath->est_entries = rcpath->est_entries;
 		}
 
-		add_path(joinrel, (Path *)
-				 create_nestloop_path(root,
-									  joinrel,
-									  jointype,
-									  &workspace,
-									  extra,
-									  outer_path,
-									  inner_cache_path,
-									  extra->restrictlist,
-									  pathkeys,
-									  required_outer));
-		added_path = true;
+		add_path(joinrel, (Path *)nlpath);
 	}
-	else
-	{
-		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
-
-		/* Waste no memory when we reject a path here */
-		list_free(rcpath->hash_operators);
-		list_free(rcpath->param_exprs);
-		pfree(rcpath);
-	}
-
-	if (!added_path)
-	{
-		/* Waste no memory when we reject a path here */
-		bms_free(required_outer);
-	}
-
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 45e211262a..7afb7741d0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4147,6 +4147,7 @@ create_nestloop_plan(PlannerInfo *root,
 	Relids		outerrelids;
 	List	   *nestParams;
 	Relids		saveOuterRels = root->curOuterRels;
+	List	   *param_exprs = NIL;
 
 	/* NestLoop can project, so no need to be picky about child tlists */
 	outer_plan = create_plan_recurse(root, best_path->outerjoinpath, 0);
@@ -4157,6 +4158,9 @@ create_nestloop_plan(PlannerInfo *root,
 
 	inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0);
 
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
 	/* Restore curOuterRels */
 	bms_free(root->curOuterRels);
 	root->curOuterRels = saveOuterRels;
@@ -4204,6 +4208,54 @@ create_nestloop_plan(PlannerInfo *root,
 							  best_path->jointype,
 							  best_path->inner_unique);
 
+	//bool		paramcache;
+	//int			numKeys;		/* size of the two arrays below */
+
+	//Oid		   *hashOperators;	/* hash operators for each key */
+	//Oid		   *collations;		/* cache keys */
+	//List	   *param_exprs;	/* exprs containing parameters */
+	//bool		singlerow;		/* true if the cache entry should be marked as
+	//							 * complete after we store the first tuple in
+	//							 * it. */
+	//uint32		est_entries;	/* The maximum number of entries that the
+	//							 * planner expects will fit in the cache, or 0
+	//							 * if unknown */
+
+	if (best_path->use_cache)
+	{
+		Oid		   *operators;
+		Oid		   *collations;
+		ListCell   *lc;
+		ListCell   *lc2;
+		int			nkeys;
+		int			i;
+
+		join_plan->numKeys = list_length(best_path->param_exprs);
+
+		nkeys = list_length(param_exprs);
+		Assert(nkeys > 0);
+		operators = palloc(nkeys * sizeof(Oid));
+		collations = palloc(nkeys * sizeof(Oid));
+
+		i = 0;
+		forboth(lc, param_exprs, lc2, best_path->hash_operators)
+		{
+			Expr	   *param_expr = (Expr *)lfirst(lc);
+			Oid			opno = lfirst_oid(lc2);
+
+			operators[i] = opno;
+			collations[i] = exprCollation((Node *)param_expr);
+			i++;
+		}
+		join_plan->paramcache = true;
+		join_plan->param_exprs = param_exprs;
+		join_plan->hashOperators = operators;
+		join_plan->collations = collations;
+		join_plan->singlerow = best_path->singlerow;
+		join_plan->est_entries = best_path->est_entries;
+
+	}
+
 	copy_generic_path_info(&join_plan->join.plan, &best_path->path);
 
 	return join_plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e2c61b0a0..9da223139a 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -137,74 +137,6 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
-
-/*
- * outer_params_hashable
- *		Determine if it's valid to use a ResultCache node to cache already
- *		seen rows matching a given set of parameters instead of performing a
- *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
- *		if all parameters required by this query level can be hashed.  If so,
- *		return true and set 'operators' to the list of hash equality operators
- *		for the given parameters then populate 'param_exprs' with each
- *		PARAM_EXEC parameter that the subplan requires the outer query to pass
- *		it.  When hashing is not possible, false is returned and the two
- *		output lists are unchanged.
- */
-static bool
-outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
-					  List **param_exprs)
-{
-	List	   *oplist = NIL;
-	List	   *exprlist = NIL;
-	ListCell   *lc;
-
-	/* Ensure we're not given a top-level query. */
-	Assert(subroot->parent_root != NULL);
-
-	/*
-	 * It's not valid to use a Result Cache node if there are any volatile
-	 * function in the subquery.  Caching could cause fewer evaluations of
-	 * volatile functions that have side-effects
-	 */
-	if (contain_volatile_functions((Node *) subroot->parse))
-		return false;
-
-	foreach(lc, plan_params)
-	{
-		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
-		TypeCacheEntry *typentry;
-		Node	   *expr = ppi->item;
-		Param	   *param;
-
-		param = makeNode(Param);
-		param->paramkind = PARAM_EXEC;
-		param->paramid = ppi->paramId;
-		param->paramtype = exprType(expr);
-		param->paramtypmod = exprTypmod(expr);
-		param->paramcollid = exprCollation(expr);
-		param->location = -1;
-
-		typentry = lookup_type_cache(param->paramtype,
-									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
-
-		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
-		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
-		{
-			list_free(oplist);
-			list_free(exprlist);
-			return false;
-		}
-
-		oplist = lappend_oid(oplist, typentry->eq_opr);
-		exprlist = lappend(exprlist, param);
-	}
-
-	*operators = oplist;
-	*param_exprs = exprlist;
-
-	return true;				/* all params can be hashed */
-}
-
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -311,30 +243,30 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	 * regardless. It may be useful if we can only do this when it seems
 	 * likely that we'll get some repeat lookups, i.e. cache hits.
 	 */
-	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
-	{
-		List	   *operators;
-		List	   *param_exprs;
-
-		/* Determine if all the subplan parameters can be hashed */
-		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
-		{
-			ResultCachePath *cache_path;
-
-			/*
-			 * Pass -1 for the number of calls since we don't have any ideas
-			 * what that'll be.
-			 */
-			cache_path = create_resultcache_path(root,
-												 best_path->parent,
-												 best_path,
-												 param_exprs,
-												 operators,
-												 false,
-												 -1);
-			best_path = (Path *) cache_path;
-		}
-	}
+	//if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	//{
+	//	List	   *operators;
+	//	List	   *param_exprs;
+
+	//	/* Determine if all the subplan parameters can be hashed */
+	//	if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+	//	{
+	//		ResultCachePath *cache_path;
+
+	//		/*
+	//		 * Pass -1 for the number of calls since we don't have any ideas
+	//		 * what that'll be.
+	//		 */
+	//		cache_path = create_resultcache_path(root,
+	//											 best_path->parent,
+	//											 best_path,
+	//											 param_exprs,
+	//											 operators,
+	//											 false,
+	//											 -1);
+	//		best_path = (Path *) cache_path;
+	//	}
+	//}
 
 	plan = create_plan(subroot, best_path);
 
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
index d2f3ed9a74..440019d141 100644
--- a/src/include/executor/nodeResultCache.h
+++ b/src/include/executor/nodeResultCache.h
@@ -15,16 +15,11 @@
 
 #include "nodes/execnodes.h"
 
-extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecResultCacheFinishScan(ResultCacheState *rcstate);
+extern TupleTableSlot *ExecResultCache(ResultCacheState *rcstate, PlanState *innerPlan);
+extern ResultCacheState *ExecInitResultCache(NestLoop *node, PlanState *planstate, PlanState *cache_planstate);
 extern void ExecEndResultCache(ResultCacheState *node);
 extern void ExecReScanResultCache(ResultCacheState *node);
 extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
-extern void ExecResultCacheEstimate(ResultCacheState *node,
-									ParallelContext *pcxt);
-extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
-										 ParallelContext *pcxt);
-extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
-											ParallelWorkerContext *pwcxt);
-extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
 
 #endif							/* NODERESULTCACHE_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 30f66d5058..a2a70151c9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1855,12 +1855,15 @@ typedef struct JoinState
  *		NullInnerTupleSlot prepared null tuple for left outer joins
  * ----------------
  */
+struct ResultCacheState;
 typedef struct NestLoopState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	bool		nl_NeedNewOuter;
 	bool		nl_MatchedOuter;
+	bool		nl_ParamCache;
 	TupleTableSlot *nl_NullInnerTupleSlot;
+	struct ResultCacheState *nl_pcache;
 } NestLoopState;
 
 /* ----------------
@@ -2022,12 +2025,15 @@ typedef struct SharedResultCacheInfo
  */
 typedef struct ResultCacheState
 {
-	ScanState	ss;				/* its first field is NodeTag */
+	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
+	//ScanState	ss;				/* its first field is NodeTag */
 	int			rc_status;		/* value of ExecResultCache's state machine */
 	int			nkeys;			/* number of hash table keys */
 	struct resultcache_hash *hashtable; /* hash table cache entries */
 	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
 	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *cachefoundslot; /* Slot to return found cache entries */
+	TupleTableSlot *cachefoundslotmin; /* Slot to return found cache entries */
 	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
 	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
 	ExprState **param_exprs;	/* exprs containing the parameters to this
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 79a4ad20dd..31b158026c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1546,6 +1546,16 @@ typedef struct JoinPath
 
 	List	   *joinrestrictinfo;	/* RestrictInfos to apply to join */
 
+	bool		use_cache;
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+
 	/*
 	 * See the notes for RelOptInfo and ParamPathInfo to understand why
 	 * joinrestrictinfo is needed in JoinPath, and can't be merged into the
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index ac5685da64..f989d31033 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -701,6 +701,18 @@ typedef struct NestLoop
 {
 	Join		join;
 	List	   *nestParams;		/* list of NestLoopParam nodes */
+	bool		paramcache;
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
 } NestLoop;
 
 typedef struct NestLoopParam

#44

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#43)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Sat, 29 Aug 2020 at 02:54, David Rowley <dgrowleyml@gmail.com> wrote:

I'm open to ideas to make the comparison fairer.

While on that, it's not just queries that don't require the cached
tuple to be deformed that are slower. Here's a couple of example that
do requite both patches to deform the cached tuple:

Some other results that do result in both patches deforming tuples
still slows that v7 is faster:

Query1:

v7 + attached patch
postgres=# explain (analyze, timing off) select count(l.a) from
hundredk hk inner join lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=378570.41..378570.42 rows=1 width=8) (actual rows=1 loops=1)
-> Nested Loop Cached (cost=0.43..353601.00 rows=9987763 width=4)
(actual rows=10000000 loops=1)
Cache Key: $0
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=4) (actual rows=100000 loops=1)
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.050 ms
Execution Time: 928.698 ms
(10 rows)

v7 only:
postgres=# explain (analyze, timing off) select count(l.a) from
hundredk hk inner join lookup100 l on hk.one = l.a;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=152861.19..152861.20 rows=1 width=8) (actual rows=1 loops=1)
-> Nested Loop (cost=0.45..127891.79 rows=9987763 width=4)
(actual rows=10000000 loops=1)
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=4) (actual rows=100000 loops=1)
-> Result Cache (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=100000)
Cache Key: hk.one
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Index Only Scan using lookup100_a_idx on lookup100
l (cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.604 ms
Execution Time: 897.958 ms
(11 rows)

Query2:

v7 + attached patch
postgres=# explain (analyze, timing off) select * from hundredk hk
inner join lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop Cached (cost=0.43..353601.00 rows=9987763 width=28)
(actual rows=10000000 loops=1)
Cache Key: $0
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=24) (actual rows=100000 loops=1)
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.621 ms
Execution Time: 883.610 ms
(9 rows)

v7 only:
postgres=# explain (analyze, timing off) select * from hundredk hk
inner join lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.45..127891.79 rows=9987763 width=28) (actual
rows=10000000 loops=1)
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=24) (actual rows=100000 loops=1)
-> Result Cache (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=100000)
Cache Key: hk.one
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.088 ms
Execution Time: 870.601 ms
(10 rows)

David

#45

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#32)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, Aug 19, 2020 at 6:58 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-19, David Rowley wrote:

Andres' suggestion:
regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=10000 loops=1)
Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
Index Cond: (unique1 = t1.twenty)
(6 rows)

I think it doesn't look terrible in the SubPlan case -- it kinda makes
sense there -- but for nested loop it appears really strange.

I disagree. I don't know why anyone should find this confusing, except
that we're not used to seeing it. It seems to make a lot of sense that
if you are executing the same plan tree with different parameters, you
might want to cache results to avoid recomputation. So why wouldn't
nodes that do this include a cache?

This is not necessarily a vote for Andres's proposal. I don't know
whether it's technically better to include the caching in the Nested
Loop node or to make it a separate node, and I think we should do the
one that's better. Getting pushed into an inferior design because we
think the EXPLAIN output will be clearer does not make sense to me.

I think David's points elsewhere on the thread about ProjectSet and
Materialize nodes are interesting. It was never very clear to me why
ProjectSet was handled separately in every node, adding quite a bit of
complexity, and why Materialize was a separate node. Likewise, why are
Hash Join and Hash two separate nodes instead of just one? Why do we
not treat projection as a separate node type even when we're not
projecting a set? In general, I've never really understood why we
choose to include some functionality in other nodes and keep other
things separate. Is there even an organizing principle, or is it just
historical baggage?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#46

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Robert Haas (#45)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Sat, Aug 29, 2020 at 3:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

I think David's points elsewhere on the thread about ProjectSet and
Materialize nodes are interesting.

Indeed, I'm now finding it very difficult to look past the similarity with:

postgres=# explain select count(*) from t t1 cross join t t2;
QUERY PLAN
----------------------------------------------------------------------------
Aggregate (cost=1975482.56..1975482.57 rows=1 width=8)
-> Nested Loop (cost=0.00..1646293.50 rows=131675625 width=0)
-> Seq Scan on t t1 (cost=0.00..159.75 rows=11475 width=0)
-> Materialize (cost=0.00..217.12 rows=11475 width=0)
-> Seq Scan on t t2 (cost=0.00..159.75 rows=11475 width=0)
(5 rows)

I wonder what it would take to overcome the overheads of the separate
Result Cache node, with techniques to step out of the way or something
like that.

[tricky philosophical questions about ancient and maybe in some cases arbitrary choices]

Ack.

#47

[1]: /messages/by-id/CAApHDvo2acQSogMCa3hB7moRntXWHO8G+WSwhyty2+c8vYRq3A@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: Thomas Munro (#46)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for chipping in here.

On Mon, 31 Aug 2020 at 17:57, Thomas Munro <thomas.munro@gmail.com> wrote:

I wonder what it would take to overcome the overheads of the separate
Result Cache node, with techniques to step out of the way or something
like that.

So far it looks like there are more overheads to having the caching
done inside nodeNestloop.c. See [1]/messages/by-id/CAApHDvo2acQSogMCa3hB7moRntXWHO8G+WSwhyty2+c8vYRq3A@mail.gmail.com. Perhaps there's something that
can be done to optimise away the needless MinimalTuple deform that I
mentioned there, but for now, performance-wise, we're better off
having a separate node.

David

#48

[1]: /messages/by-id/CAApHDvqt5U6VcKSm2G9Q1n4rsHejL-VX7QG9KToAQ0HyZymSzQ@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#43)

1 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Sat, 29 Aug 2020 at 02:54, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 26 Aug 2020 at 03:52, Andres Freund <andres@anarazel.de> wrote:

There'll be a significant reduction in increase in performance.

So I did a very rough-cut change to the patch to have the caching be
part of Nested Loop. It can be applied on top of the other 3 v7
patches.

For the performance, the test I did results in the performance
actually being reduced from having the Result Cache as a separate
node. The reason for this is mostly because Nested Loop projects.

I spoke to Andres off-list this morning in regards to what can be done
to remove this performance regression over the separate Result Cache
node version of the patch. He mentioned that I could create another
ProjectionInfo for when reading from the cache's slot and use that to
project with.

I've hacked this up in the attached. It looks like another version of
the joinqual would also need to be created to that the MinimalTuple
from the cache is properly deformed. I've not done this yet.

The performance does improve this time. Using the same two test
queries from [1]/messages/by-id/CAApHDvqt5U6VcKSm2G9Q1n4rsHejL-VX7QG9KToAQ0HyZymSzQ@mail.gmail.com, I get:

v7 (Separate Result Cache node)

Query 1:
postgres=# explain (analyze, timing off) select count(l.a) from
hundredk hk inner join lookup100 l on hk.one = l.a;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=152861.19..152861.20 rows=1 width=8) (actual rows=1 loops=1)
-> Nested Loop (cost=0.45..127891.79 rows=9987763 width=4)
(actual rows=10000000 loops=1)
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=4) (actual rows=100000 loops=1)
-> Result Cache (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=100000)
Cache Key: hk.one
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Index Only Scan using lookup100_a_idx on lookup100
l (cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.045 ms
Execution Time: 894.003 ms
(11 rows)

Query 2:
postgres=# explain (analyze, timing off) select * from hundredk hk
inner join lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.45..127891.79 rows=9987763 width=28) (actual
rows=10000000 loops=1)
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=24) (actual rows=100000 loops=1)
-> Result Cache (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=100000)
Cache Key: hk.one
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.077 ms
Execution Time: 854.950 ms
(10 rows)

v7 + hacks_V3 (caching done in Nested Loop)

Query 1:
explain (analyze, timing off) select count(l.a) from hundredk hk inner
join lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=378570.41..378570.42 rows=1 width=8) (actual rows=1 loops=1)
-> Nested Loop Cached (cost=0.43..353601.00 rows=9987763 width=4)
(actual rows=10000000 loops=1)
Cache Key: $0
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=4) (actual rows=100000 loops=1)
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.103 ms
Execution Time: 770.470 ms
(10 rows)

Query 2
explain (analyze, timing off) select * from hundredk hk inner join
lookup100 l on hk.one = l.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop Cached (cost=0.43..353601.00 rows=9987763 width=28)
(actual rows=10000000 loops=1)
Cache Key: $0
Hits: 99999 Misses: 1 Evictions: 0 Overflows: 0
-> Seq Scan on hundredk hk (cost=0.00..1637.00 rows=100000
width=24) (actual rows=100000 loops=1)
-> Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
Index Cond: (a = hk.one)
Heap Fetches: 0
Planning Time: 0.090 ms
Execution Time: 779.181 ms
(9 rows)

Also, I'd just like to reiterate that the attached is a very rough cut
implementation that I've put together just to use for performance
comparison in order to help move this conversation along. (I do know
that I'm breaking the const qualifier on PlanState's innerops.)

David

Attachments:

resultcache_in_nestloop_hacks_v3.patch.txttext/plain; charset=US-ASCII; name=resultcache_in_nestloop_hacks_v3.patch.txtDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6d4b9eb3b9..42c6df549f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,8 +108,7 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
-static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
-								  ExplainState *es);
+static void show_resultcache_info(NestLoopState *nlstate, List *ancestors, ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1494,10 +1493,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					 * For historical reasons, the join type is interpolated
 					 * into the node type name...
 					 */
-					if (((Join *) plan)->jointype != JOIN_INNER)
+					if (((Join *)plan)->jointype != JOIN_INNER)
 						appendStringInfo(es->str, " %s Join", jointype);
 					else if (!IsA(plan, NestLoop))
 						appendStringInfoString(es->str, " Join");
+					else if (castNode(NestLoop, plan)->paramcache)
+						appendStringInfoString(es->str, " Cached");
+
 				}
 				else
 					ExplainPropertyText("Join Type", jointype, es);
@@ -1883,6 +1885,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			break;
 		case T_NestLoop:
+			show_resultcache_info((NestLoopState *) planstate, ancestors, es);
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
 			if (((NestLoop *) plan)->join.joinqual)
@@ -1963,10 +1966,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
-		case T_ResultCache:
-			show_resultcache_info(castNode(ResultCacheState, planstate),
-								  ancestors, es);
-			break;
+		//case T_ResultCache:
+		//	show_resultcache_info(castNode(ResultCacheState, planstate),
+		//						  ancestors, es);
+		//	break;
 		default:
 			break;
 	}
@@ -3041,15 +3044,19 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 }
 
 static void
-show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+show_resultcache_info(NestLoopState *nlstate, List *ancestors, ExplainState *es)
 {
-	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	Plan	   *plan = ((PlanState *) nlstate)->plan;
+	ResultCacheState *rcstate;
 	ListCell   *lc;
 	List	   *context;
 	StringInfoData keystr;
 	char	   *seperator = "";
 	bool		useprefix;
 
+	if (nlstate->nl_pcache == NULL)
+		return;
+
 	initStringInfo(&keystr);
 
 	/* XXX surely we'll always have more than one if we have a resultcache? */
@@ -3060,7 +3067,7 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *
 									   plan,
 									   ancestors);
 
-	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	foreach(lc, ((NestLoop *) plan)->param_exprs)
 	{
 		Node	   *expr = (Node *) lfirst(lc);
 
@@ -3086,6 +3093,8 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *
 	if (!es->analyze)
 		return;
 
+
+	rcstate = nlstate->nl_pcache;
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 68920ecd89..f9c2f80c79 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,7 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -250,9 +250,9 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
-		case T_ResultCacheState:
-			ExecReScanResultCache((ResultCacheState *) node);
-			break;
+		//case T_ResultCacheState:
+		//	ExecReScanResultCache((ResultCacheState *) node);
+		//	break;
 
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 459e9dd3e9..37cfa36881 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,7 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -294,10 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+		//	break;
 		default:
 			break;
 	}
@@ -518,10 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+		//	break;
 		default:
 			break;
 	}
@@ -998,9 +998,9 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
-		case T_ResultCacheState:
-			/* these nodes have DSM state, but no reinitialization is required */
-			break;
+		//case T_ResultCacheState:
+		//	/* these nodes have DSM state, but no reinitialization is required */
+		//	break;
 
 		default:
 			break;
@@ -1068,9 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
-		case T_ResultCacheState:
-			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
-			break;
+		//case T_ResultCacheState:
+		//	ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+		//	break;
 		default:
 			break;
 	}
@@ -1363,11 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
-		case T_ResultCacheState:
-			/* even when not parallel-aware, for EXPLAIN ANALYZE */
-			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
-											pwcxt);
-			break;
+		//case T_ResultCacheState:
+		//	/* even when not parallel-aware, for EXPLAIN ANALYZE */
+		//	ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+		//									pwcxt);
+		//	break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index fbbe667cc1..e5b8c74da7 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,7 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
-#include "executor/nodeResultCache.h"
+//#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -320,10 +320,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
-		case T_ResultCache:
-			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
-													   estate, eflags);
-			break;
+		//case T_ResultCache:
+		//	result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+		//											   estate, eflags);
+		//	break;
 
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
@@ -709,9 +709,9 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
-		case T_ResultCacheState:
-			ExecEndResultCache((ResultCacheState *) node);
-			break;
+		//case T_ResultCacheState:
+		//	ExecEndResultCache((ResultCacheState *) node);
+		//	break;
 
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c2996d4..e407158e5e 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -23,9 +23,29 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeNestloop.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
 
+static inline TupleTableSlot *
+FetchInnerTuple(NestLoopState *nlstate, PlanState *innerPlan)
+{
+	ResultCacheState *rcstate = nlstate->nl_pcache;
+
+	/* No caching? Just exec the inner node */
+	if (rcstate == NULL)
+	{
+		nlstate->js.ps.ps_ProjInfo = nlstate->ps_ScanProjInfo;
+		return ExecProcNode(innerPlan);
+	}
+	/* Otherwise let the cache deal with it */
+	else
+	{
+		nlstate->js.ps.ps_ProjInfo = nlstate->ps_CacheProjInfo;
+		return ExecResultCache(rcstate, innerPlan);
+	}
+}
+
 
 /* ----------------------------------------------------------------
  *		ExecNestLoop(node)
@@ -150,6 +170,11 @@ ExecNestLoop(PlanState *pstate)
 			 */
 			ENL1_printf("rescanning inner plan");
 			ExecReScan(innerPlan);
+
+			/* When using a result cache, reset the state ready for another lookup */
+			if (node->nl_pcache)
+				ExecResultCacheFinishScan(node->nl_pcache);
+
 		}
 
 		/*
@@ -157,7 +182,7 @@ ExecNestLoop(PlanState *pstate)
 		 */
 		ENL1_printf("getting new inner tuple");
 
-		innerTupleSlot = ExecProcNode(innerPlan);
+		innerTupleSlot = FetchInnerTuple(node, innerPlan);
 		econtext->ecxt_innertuple = innerTupleSlot;
 
 		if (TupIsNull(innerTupleSlot))
@@ -306,6 +331,8 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 */
 	ExecInitResultTupleSlotTL(&nlstate->js.ps, &TTSOpsVirtual);
 	ExecAssignProjectionInfo(&nlstate->js.ps, NULL);
+	nlstate->ps_ScanProjInfo = nlstate->js.ps.ps_ProjInfo;
+
 
 	/*
 	 * initialize child expressions
@@ -345,6 +372,42 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 */
 	nlstate->nl_NeedNewOuter = true;
 	nlstate->nl_MatchedOuter = false;
+	nlstate->nl_ParamCache = node->paramcache;
+
+	/* Setup the result cache if enabled */
+	if (nlstate->nl_ParamCache)
+	{
+		nlstate->nl_pcache = ExecInitResultCache(node, (PlanState *)nlstate, (PlanState *) innerPlanState(nlstate));
+
+		/*
+		 * Create a seperate Projection info for projecting from the slots
+		 * belonging to the result cache.
+		 */
+		if (nlstate->js.ps.innerops != &TTSOpsMinimalTuple)
+		{
+			TupleTableSlotOps *ttsops = nlstate->js.ps.innerops;
+			bool inneropsset = nlstate->js.ps.inneropsset;
+
+			nlstate->js.ps.innerops = &TTSOpsMinimalTuple;
+			nlstate->js.ps.inneropsset = true;
+
+			nlstate->ps_CacheProjInfo = ExecBuildProjectionInfo(nlstate->js.ps.plan->targetlist,
+																nlstate->js.ps.ps_ExprContext,
+																nlstate->js.ps.ps_ResultTupleSlot,
+																&nlstate->js.ps,
+																NULL);
+
+			/* Restore original values */
+			nlstate->js.ps.innerops = ttsops;
+			nlstate->js.ps.inneropsset = inneropsset;
+		}
+
+	}
+	else
+	{
+		nlstate->nl_pcache = NULL;
+		nlstate->ps_CacheProjInfo = NULL;
+	}
 
 	NL1_printf("ExecInitNestLoop: %s\n",
 			   "node initialized");
@@ -352,6 +415,7 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	return nlstate;
 }
 
+
 /* ----------------------------------------------------------------
  *		ExecEndNestLoop
  *
@@ -380,6 +444,9 @@ ExecEndNestLoop(NestLoopState *node)
 	ExecEndNode(outerPlanState(node));
 	ExecEndNode(innerPlanState(node));
 
+	if (node->nl_pcache)
+		ExecEndResultCache(node->nl_pcache);
+
 	NL1_printf("ExecEndNestLoop: %s\n",
 			   "node processing ended");
 }
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 09b25ea184..3549ee9ae1 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -66,7 +66,6 @@
 										 * subplan without caching anything */
 #define RC_END_OF_SCAN				5	/* Ready for rescan */
 
-
 /* Helper macros for memory accounting */
 #define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
 										 sizeof(ResultCacheKey) + \
@@ -179,7 +178,7 @@ ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
 					  const ResultCacheKey *key2)
 {
 	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
-	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	ExprContext *econtext = rcstate->ps_ExprContext;
 	TupleTableSlot *tslot = rcstate->tableslot;
 	TupleTableSlot *pslot = rcstate->probeslot;
 
@@ -223,7 +222,7 @@ prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
 		/* Set the probeslot's values based on the current parameter values */
 		for (int i = 0; i < numKeys; i++)
 			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
-												rcstate->ss.ps.ps_ExprContext,
+												rcstate->ps_ExprContext,
 												&pslot->tts_isnull[i]);
 	}
 	else
@@ -243,7 +242,7 @@ prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
  *		Remove all tuples from a cache entry, leaving an empty cache entry.
  *		Also update memory accounting to reflect the removal of the tuples.
  */
-static inline void
+static void
 entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
 {
 	ResultCacheTuple *tuple = entry->tuplehead;
@@ -590,21 +589,32 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	return true;
 }
 
-static TupleTableSlot *
-ExecResultCache(PlanState *pstate)
+/*
+ * Caller to call this after it finishes a parameterized scan
+ */
+void
+ExecResultCacheFinishScan(ResultCacheState *rcstate)
+{
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	rcstate->entry = NULL;
+	rcstate->last_tuple = NULL;
+}
+
+TupleTableSlot *
+ExecResultCache(ResultCacheState *rcstate, PlanState *innerPlan)
 {
-	ResultCacheState *node = castNode(ResultCacheState, pstate);
-	PlanState  *outerNode;
 	TupleTableSlot *slot;
 
-	switch (node->rc_status)
+	switch (rcstate->rc_status)
 	{
 		case RC_CACHE_LOOKUP:
 			{
 				ResultCacheEntry *entry;
 				bool		found;
 
-				Assert(node->entry == NULL);
+				Assert(rcstate->entry == NULL);
 
 				/*
 				 * We're only ever in this state for the first call of the
@@ -619,44 +629,44 @@ ExecResultCache(PlanState *pstate)
 				 * one there, we'll try to cache it.
 				 */
 
-				/* see if we've got anything cached for the current parameters */
-				entry = cache_lookup(node, &found);
+				 /* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(rcstate, &found);
 
 				if (found && entry->complete)
 				{
-					node->stats.cache_hits += 1;	/* stats update */
+					rcstate->stats.cache_hits += 1;	/* stats update */
 
 					/*
 					 * Set last_tuple and entry so that the state
 					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
 					 * tuple for these parameters.
 					 */
-					node->last_tuple = entry->tuplehead;
-					node->entry = entry;
+					rcstate->last_tuple = entry->tuplehead;
+					rcstate->entry = entry;
 
 					/* Fetch the first cached tuple, if there is one */
 					if (entry->tuplehead)
 					{
-						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
-
-						slot = node->ss.ps.ps_ResultTupleSlot;
-						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
-											  slot, false);
+						rcstate->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
 
+						ExecClearTuple(rcstate->cachefoundslot);
+						slot = rcstate->cachefoundslotmin;
+						ExecStoreMinimalTuple(rcstate->last_tuple->mintuple, slot, false);
 						return slot;
+						//return ExecCopySlot(rcstate->cachefoundslot, slot);
 					}
 					else
 					{
 						/* The cache entry is void of any tuples. */
-						node->rc_status = RC_END_OF_SCAN;
+						rcstate->rc_status = RC_END_OF_SCAN;
 						return NULL;
 					}
 				}
 				else
 				{
-					TupleTableSlot *outerslot;
+					TupleTableSlot *innerslot;
 
-					node->stats.cache_misses += 1;	/* stats update */
+					rcstate->stats.cache_misses += 1;	/* stats update */
 
 					if (found)
 					{
@@ -668,13 +678,12 @@ ExecResultCache(PlanState *pstate)
 						 * guarantee the outer node will produce the tuples in
 						 * the same order as it did last time.
 						 */
-						entry_purge_tuples(node, entry);
+						entry_purge_tuples(rcstate, entry);
 					}
 
 					/* Scan the outer node for a tuple to cache */
-					outerNode = outerPlanState(node);
-					outerslot = ExecProcNode(outerNode);
-					if (TupIsNull(outerslot))
+					innerslot = ExecProcNode(innerPlan);
+					if (TupIsNull(innerslot))
 					{
 						/*
 						 * cache_lookup may have returned NULL due to failure
@@ -686,22 +695,22 @@ ExecResultCache(PlanState *pstate)
 						if (likely(entry))
 							entry->complete = true;
 
-						node->rc_status = RC_END_OF_SCAN;
+						rcstate->rc_status = RC_END_OF_SCAN;
 						return NULL;
 					}
 
-					node->entry = entry;
+					rcstate->entry = entry;
 
 					/*
 					 * If we failed to create the entry or failed to store the
 					 * tuple in the entry, then go into bypass mode.
 					 */
 					if (unlikely(entry == NULL ||
-								 !cache_store_tuple(node, outerslot)))
+						!cache_store_tuple(rcstate, innerslot)))
 					{
-						node->stats.cache_overflows += 1;	/* stats update */
+						rcstate->stats.cache_overflows += 1;	/* stats update */
 
-						node->rc_status = RC_CACHE_BYPASS_MODE;
+						rcstate->rc_status = RC_CACHE_BYPASS_MODE;
 
 						/*
 						 * No need to clear out last_tuple as we'll stay in
@@ -716,43 +725,41 @@ ExecResultCache(PlanState *pstate)
 						 * allows cache lookups to work even when the scan has
 						 * not been executed to completion.
 						 */
-						entry->complete = node->singlerow;
-						node->rc_status = RC_FILLING_CACHE;
+						entry->complete = rcstate->singlerow;
+						rcstate->rc_status = RC_FILLING_CACHE;
 					}
 
-					slot = node->ss.ps.ps_ResultTupleSlot;
-					ExecCopySlot(slot, outerslot);
-					return slot;
+					return innerslot;
 				}
 			}
 
 		case RC_CACHE_FETCH_NEXT_TUPLE:
 			{
 				/* We shouldn't be in this state if these are not set */
-				Assert(node->entry != NULL);
-				Assert(node->last_tuple != NULL);
+				Assert(rcstate->entry != NULL);
+				Assert(rcstate->last_tuple != NULL);
 
 				/* Skip to the next tuple to output */
-				node->last_tuple = node->last_tuple->next;
+				rcstate->last_tuple = rcstate->last_tuple->next;
 
 				/* No more tuples in the cache */
-				if (node->last_tuple == NULL)
+				if (rcstate->last_tuple == NULL)
 				{
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 
-				slot = node->ss.ps.ps_ResultTupleSlot;
-				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
-									  false);
-
+				ExecClearTuple(rcstate->cachefoundslot);
+				slot = rcstate->cachefoundslotmin;
+				ExecStoreMinimalTuple(rcstate->last_tuple->mintuple, slot, false);
 				return slot;
+				//return ExecCopySlot(rcstate->cachefoundslot, slot);
 			}
 
 		case RC_FILLING_CACHE:
 			{
-				TupleTableSlot *outerslot;
-				ResultCacheEntry *entry = node->entry;
+				TupleTableSlot *innerslot;
+				ResultCacheEntry *entry = rcstate->entry;
 
 				/* entry should already have been set by RC_CACHE_LOOKUP */
 				Assert(entry != NULL);
@@ -762,13 +769,12 @@ ExecResultCache(PlanState *pstate)
 				 * miss and are populating the cache with the current scan
 				 * tuples.
 				 */
-				outerNode = outerPlanState(node);
-				outerslot = ExecProcNode(outerNode);
-				if (TupIsNull(outerslot))
+				innerslot = ExecProcNode(innerPlan);
+				if (TupIsNull(innerslot))
 				{
 					/* No more tuples.  Mark it as complete */
 					entry->complete = true;
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 				else
@@ -782,12 +788,12 @@ ExecResultCache(PlanState *pstate)
 						elog(ERROR, "cache entry already complete");
 
 					/* Record the tuple in the current cache entry */
-					if (unlikely(!cache_store_tuple(node, outerslot)))
+					if (unlikely(!cache_store_tuple(rcstate, innerslot)))
 					{
 						/* Couldn't store it?  Handle overflow */
-						node->stats.cache_overflows += 1;	/* stats update */
+						rcstate->stats.cache_overflows += 1;	/* stats update */
 
-						node->rc_status = RC_CACHE_BYPASS_MODE;
+						rcstate->rc_status = RC_CACHE_BYPASS_MODE;
 
 						/*
 						 * No need to clear out entry or last_tuple as we'll
@@ -795,32 +801,27 @@ ExecResultCache(PlanState *pstate)
 						 */
 					}
 
-					slot = node->ss.ps.ps_ResultTupleSlot;
-					ExecCopySlot(slot, outerslot);
-					return slot;
+					return innerslot;
 				}
 			}
 
 		case RC_CACHE_BYPASS_MODE:
 			{
-				TupleTableSlot *outerslot;
+				TupleTableSlot *innerslot;
 
 				/*
 				 * When in bypass mode we just continue to read tuples without
 				 * caching.  We need to wait until the next rescan before we
 				 * can come out of this mode.
 				 */
-				outerNode = outerPlanState(node);
-				outerslot = ExecProcNode(outerNode);
-				if (TupIsNull(outerslot))
+				innerslot = ExecProcNode(innerPlan);
+				if (TupIsNull(innerslot))
 				{
-					node->rc_status = RC_END_OF_SCAN;
+					rcstate->rc_status = RC_END_OF_SCAN;
 					return NULL;
 				}
 
-				slot = node->ss.ps.ps_ResultTupleSlot;
-				ExecCopySlot(slot, outerslot);
-				return slot;
+				return innerslot;
 			}
 
 		case RC_END_OF_SCAN:
@@ -833,60 +834,34 @@ ExecResultCache(PlanState *pstate)
 
 		default:
 			elog(ERROR, "unrecognized resultcache state: %d",
-				 (int) node->rc_status);
+				 (int) rcstate->rc_status);
 			return NULL;
 	}							/* switch */
 }
 
 ResultCacheState *
-ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+ExecInitResultCache(NestLoop *node, PlanState *planstate, PlanState *cache_planstate)
 {
 	ResultCacheState *rcstate = makeNode(ResultCacheState);
-	Plan	   *outerNode;
 	int			i;
 	int			nkeys;
 	Oid		   *eqfuncoids;
 
-	/* check for unsupported flags */
-	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
-
-	rcstate->ss.ps.plan = (Plan *) node;
-	rcstate->ss.ps.state = estate;
-	rcstate->ss.ps.ExecProcNode = ExecResultCache;
-
-	/*
-	 * Miscellaneous initialization
-	 *
-	 * create expression context for node
-	 */
-	ExecAssignExprContext(estate, &rcstate->ss.ps);
-
-	outerNode = outerPlan(node);
-	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
-
-	/*
-	 * Initialize return slot and type. No need to initialize projection info
-	 * because this node doesn't do projections.
-	 */
-	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
-	rcstate->ss.ps.ps_ProjInfo = NULL;
-
-	/*
-	 * Initialize scan slot and type.
-	 */
-	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
-
-	/*
-	 * Set the state machine to lookup the cache.  We won't find anything
-	 * until we cache something, but this saves a special case to create the
-	 * first entry.
-	 */
+	rcstate->ps_ExprContext = CreateExprContext(planstate->state);
 	rcstate->rc_status = RC_CACHE_LOOKUP;
 
 	rcstate->nkeys = nkeys = node->numKeys;
 	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
 	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
 												  &TTSOpsMinimalTuple);
+	/* XXX this should make a slot the same type as cache_planstates result slot.  For now
+	 * that'll always be a nested loop, so just make a virtual slot, which is what nested loop
+	 * uses.
+	 */
+	rcstate->cachefoundslot = MakeSingleTupleTableSlot(cache_planstate->ps_ResultTupleDesc,
+		&TTSOpsVirtual);
+	rcstate->cachefoundslotmin = MakeSingleTupleTableSlot(cache_planstate->ps_ResultTupleDesc,
+		&TTSOpsMinimalTuple);
 	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
 												  &TTSOpsVirtual);
 
@@ -910,7 +885,7 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 
 		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
 
-		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *)planstate);
 		eqfuncoids[i] = get_opcode(hashop);
 	}
 
@@ -919,7 +894,7 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 													eqfuncoids,
 													node->collations,
 													node->param_exprs,
-													(PlanState *) rcstate);
+													(PlanState *) planstate);
 
 	pfree(eqfuncoids);
 	rcstate->mem_used = 0;
@@ -970,57 +945,12 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 void
 ExecEndResultCache(ResultCacheState *node)
 {
-	/*
-	 * When ending a parallel worker, copy the statistics gathered by the
-	 * worker back into shared memory so that it can be picked up by the main
-	 * process to report in EXPLAIN ANALYZE.
-	 */
-	if (node->shared_info && IsParallelWorker())
-	{
-		ResultCacheInstrumentation *si;
-
-		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
-		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
-	}
-
 	/* Remove the cache context */
 	MemoryContextDelete(node->tableContext);
 
-	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-	/* must drop pointer to cache result tuple */
-	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
-	/*
-	 * free exprcontext
-	 */
-	ExecFreeExprContext(&node->ss.ps);
-
-	/*
-	 * shut down the subplan
-	 */
-	ExecEndNode(outerPlanState(node));
-}
-
-void
-ExecReScanResultCache(ResultCacheState *node)
-{
-	PlanState  *outerPlan = outerPlanState(node);
-
-	/* Mark that we must lookup the cache for a new set of parameters */
-	node->rc_status = RC_CACHE_LOOKUP;
-
-	/* nullify pointers used for the last scan */
-	node->entry = NULL;
-	node->last_tuple = NULL;
-
-	/*
-	 * if chgParam of subnode is not null then plan will be re-scanned by
-	 * first ExecProcNode.
-	 */
-	if (outerPlan->chgParam == NULL)
-		ExecReScan(outerPlan);
-
+	ExecClearTuple(node->cachefoundslot);
+	ExecClearTuple(node->cachefoundslotmin);
+	FreeExprContext(node->ps_ExprContext, false);
 }
 
 /*
@@ -1035,88 +965,3 @@ ExecEstimateCacheEntryOverheadBytes(double ntuples)
 		sizeof(ResultCacheTuple) * ntuples;
 }
 
-/* ----------------------------------------------------------------
- *						Parallel Query Support
- * ----------------------------------------------------------------
- */
-
- /* ----------------------------------------------------------------
-  *		ExecResultCacheEstimate
-  *
-  *		Estimate space required to propagate result cache statistics.
-  * ----------------------------------------------------------------
-  */
-void
-ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
-{
-	Size		size;
-
-	/* don't need this if not instrumenting or no workers */
-	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
-		return;
-
-	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
-	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
-	shm_toc_estimate_chunk(&pcxt->estimator, size);
-	shm_toc_estimate_keys(&pcxt->estimator, 1);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheInitializeDSM
- *
- *		Initialize DSM space for result cache statistics.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
-{
-	Size		size;
-
-	/* don't need this if not instrumenting or no workers */
-	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
-		return;
-
-	size = offsetof(SharedResultCacheInfo, sinstrument)
-		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
-	node->shared_info = shm_toc_allocate(pcxt->toc, size);
-	/* ensure any unfilled slots will contain zeroes */
-	memset(node->shared_info, 0, size);
-	node->shared_info->num_workers = pcxt->nworkers;
-	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
-				   node->shared_info);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheInitializeWorker
- *
- *		Attach worker to DSM space for result cache statistics.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
-{
-	node->shared_info =
-		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
-}
-
-/* ----------------------------------------------------------------
- *		ExecResultCacheRetrieveInstrumentation
- *
- *		Transfer result cache statistics from DSM to private memory.
- * ----------------------------------------------------------------
- */
-void
-ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
-{
-	Size		size;
-	SharedResultCacheInfo *si;
-
-	if (node->shared_info == NULL)
-		return;
-
-	size = offsetof(SharedResultCacheInfo, sinstrument)
-		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
-	si = palloc(size);
-	memcpy(si, node->shared_info, size);
-	node->shared_info = si;
-}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e50844df9b..0101d719c4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2298,148 +2298,6 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
-/*
- * cost_resultcache_rescan
- *	  Determines the estimated cost of rescanning a ResultCache node.
- *
- * In order to estimate this, we must gain knowledge of how often we expect to
- * be called and how many distinct sets of parameters we are likely to be
- * called with. If we expect a good cache hit ratio, then we can set our
- * costs to account for that hit ratio, plus a little bit of cost for the
- * caching itself.  Caching will not work out well if we expect to be called
- * with too many distinct parameter values.  The worst-case here is that we
- * never see the same parameter values twice, in which case we'd never get a
- * cache hit and caching would be a complete waste of effort.
- */
-static void
-cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
-						Cost *rescan_startup_cost, Cost *rescan_total_cost)
-{
-	Cost		input_startup_cost = rcpath->subpath->startup_cost;
-	Cost		input_total_cost = rcpath->subpath->total_cost;
-	double		tuples = rcpath->subpath->rows;
-	double		calls = rcpath->calls;
-	int			width = rcpath->subpath->pathtarget->width;
-	int			flags;
-
-	double		work_mem_bytes;
-	double		est_entry_bytes;
-	double		est_cache_entries;
-	double		ndistinct;
-	double		evict_ratio;
-	double		hit_ratio;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* available cache space */
-	work_mem_bytes = work_mem * 1024L;
-
-	/*
-	 * Set the number of bytes each cache entry should consume in the cache.
-	 * To provide us with better estimations on how many cache entries we can
-	 * store at once we make a call to the excutor here to ask it what memory
-	 * overheads there are for a single cache entry.
-	 *
-	 * XXX we also store the cache key, but that's not accounted for here.
-	 */
-	est_entry_bytes = relation_byte_size(tuples, width) +
-		ExecEstimateCacheEntryOverheadBytes(tuples);
-
-	/* estimate on the upper limit of cache entries we can hold at once */
-	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
-
-	/* estimate on the distinct number of parameter values */
-	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
-									&flags);
-
-	/*
-	 * When the estimation fell back on using a default value, it's a bit too
-	 * risky to assume that it's ok to use a Result Cache.  The use of a
-	 * default could cause us to use a Result Cache when it's really
-	 * inappropriate to do so.  If we see that this has been done then we'll
-	 * assume that every call will have unique parameters, which will almost
-	 * certainly mean a ResultCachePath will never survive add_path().
-	 */
-	if ((flags & SELFLAG_USED_DEFAULT) != 0)
-		ndistinct = calls;
-
-	/*
-	 * Since we've already estimated the maximum number of entries we can
-	 * store at once and know the estimated number of distinct values we'll be
-	 * called with, well take this opportunity to set the path's est_entries.
-	 * This will ultimately determine the hash table size that the executor
-	 * will use.  If we leave this at zero the executor will just choose the
-	 * size itself.  Really this is not the right place to do this, but it's
-	 * convenient since everything is already calculated.
-	 */
-	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
-							  PG_UINT32_MAX);
-
-
-	/*
-	 * When the number of distinct parameter values is above the amount we can
-	 * store in the cache, then we'll have to evict some entries from the
-	 * cache.  This is not free, so here we estimate how often we'll incur the
-	 * cost of that eviction.
-	 */
-	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
-
-	/*
-	 * In order to estimate how costly a single scan will be, we need to
-	 * attempt to estimate what the cache hit ratio will be.  To do that we
-	 * must look at how many scans are estimated in total of this node and how
-	 * many of those scans we expect to get a cache hit.
-	 */
-	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
-		(ndistinct / calls);
-
-	/* Ensure we don't go negative */
-	hit_ratio = Max(hit_ratio, 0);
-
-	/*
-	 * Set the total_cost accounting for the expected cache hit ratio.  We
-	 * also add on a cpu_operator_cost to account for a cache lookup. This
-	 * will happen regardless of if it's a cache hit or not.
-	 */
-	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
-
-	/* Now adjust the total cost to account for cache evictions */
-
-	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
-	total_cost += cpu_tuple_cost * evict_ratio;
-
-	/*
-	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
-	 * The per-tuple eviction is really just a pfree, so charging a whole
-	 * cpu_operator_cost seems a little excessive.
-	 */
-	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
-
-	/*
-	 * Now adjust for storing things in the cache, since that's not free
-	 * either.  Everything must go in the cache, so we don't proportion this
-	 * over any ratio, just apply it once for the scan.  We charge a
-	 * cpu_tuple_cost for the creation of the cache entry and also a
-	 * cpu_operator_cost for each tuple we expect to cache.
-	 */
-	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
-
-	/*
-	 * Getting the first row must be also be proportioned according to the
-	 * expected cache hit ratio.
-	 */
-	startup_cost = input_startup_cost * (1.0 - hit_ratio);
-
-	/*
-	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
-	 * which we'll do regardless of if it was a cache hit or not.
-	 */
-	startup_cost += cpu_tuple_cost;
-
-	*rescan_startup_cost = startup_cost;
-	*rescan_total_cost = total_cost;
-}
-
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4167,11 +4025,6 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
-		case T_ResultCache:
-			/* All the hard work is done by cost_resultcache_rescan */
-			cost_resultcache_rescan(root, (ResultCachePath *) path,
-									rescan_startup_cost, rescan_total_cost);
-			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index f4c76577ad..5918dd9a3a 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,13 +17,16 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "executor/nodeResultCache.h"
 #include "foreign/fdwapi.h"
+#include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
@@ -554,6 +557,152 @@ get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
 	return NULL;
 }
 
+static double
+relation_byte_size(double tuples, int width)
+{
+	return tuples * (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+}
+
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static bool
+use_nestedloop_cache(PlannerInfo *root, NestPath *nlpath)
+{
+	Cost		input_startup_cost = nlpath->innerjoinpath->startup_cost;
+	Cost		input_total_cost = nlpath->innerjoinpath->total_cost;
+	double		tuples = nlpath->innerjoinpath->rows;
+	double		calls = nlpath->outerjoinpath->rows;
+	int			width = nlpath->innerjoinpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = 1; // estimate_num_groups(root, nlpath->rcpath->param_exprs, calls, NULL,
+		//&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	//nlpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+	//	PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	return total_cost < nlpath->innerjoinpath->total_cost;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -576,8 +725,7 @@ try_nestloop_path(PlannerInfo *root,
 	Relids		outerrelids;
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
-	Path	   *inner_cache_path;
-	bool		added_path = false;
+	ResultCachePath	   *rcpath;
 
 	/*
 	 * Paths are parameterized by top-level parents, so run parameterization
@@ -628,6 +776,7 @@ try_nestloop_path(PlannerInfo *root,
 						  workspace.startup_cost, workspace.total_cost,
 						  pathkeys, required_outer))
 	{
+		NestPath *nlpath;
 		/*
 		 * If the inner path is parameterized, it is parameterized by the
 		 * topmost parent of the outer rel, not the outer rel itself.  Fix
@@ -649,103 +798,37 @@ try_nestloop_path(PlannerInfo *root,
 			}
 		}
 
-		add_path(joinrel, (Path *)
-				 create_nestloop_path(root,
-									  joinrel,
-									  jointype,
-									  &workspace,
-									  extra,
-									  outer_path,
-									  inner_path,
-									  extra->restrictlist,
-									  pathkeys,
-									  required_outer));
-		added_path = true;
-	}
-
-	/*
-	 * See if we can build a result cache path for this inner_path. That might
-	 * make the nested loop cheaper.
-	 */
-	inner_cache_path = get_resultcache_path(root, innerrel, outerrel,
-											inner_path, outer_path, jointype,
-											extra);
-
-	if (inner_cache_path == NULL)
-	{
-		if (!added_path)
-			bms_free(required_outer);
-		return;
-	}
-
-	initial_cost_nestloop(root, &workspace, jointype,
-						  outer_path, inner_cache_path, extra);
-
-	if (add_path_precheck(joinrel,
-						  workspace.startup_cost, workspace.total_cost,
-						  pathkeys, required_outer))
-	{
 		/*
-		 * If the inner path is parameterized, it is parameterized by the
-		 * topmost parent of the outer rel, not the outer rel itself.  Fix
-		 * that.
+		 * See if we can build a result cache path for this inner_path. That might
+		 * make the nested loop cheaper.
 		 */
-		if (PATH_PARAM_BY_PARENT(inner_cache_path, outer_path->parent))
+		rcpath = (ResultCachePath *) get_resultcache_path(root, innerrel, outerrel,
+			inner_path, outer_path, jointype,
+			extra);
+
+		nlpath = create_nestloop_path(root,
+			joinrel,
+			jointype,
+			&workspace,
+			extra,
+			outer_path,
+			inner_path,
+			extra->restrictlist,
+			pathkeys,
+			required_outer);
+
+		if (rcpath != NULL)
 		{
-			Path	   *reparameterize_path;
-
-			reparameterize_path = reparameterize_path_by_child(root,
-															   inner_cache_path,
-															   outer_path->parent);
-
-			/*
-			 * If we could not translate the path, we can't create nest loop
-			 * path.
-			 */
-			if (!reparameterize_path)
-			{
-				ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
-
-				/* Waste no memory when we reject a path here */
-				list_free(rcpath->hash_operators);
-				list_free(rcpath->param_exprs);
-				pfree(rcpath);
-
-				if (!added_path)
-					bms_free(required_outer);
-				return;
-			}
+			nlpath->use_cache = true;
+			nlpath->hash_operators = rcpath->hash_operators;
+			nlpath->param_exprs = rcpath->param_exprs;
+			nlpath->singlerow = rcpath->singlerow;
+			nlpath->calls = rcpath->calls;
+			nlpath->est_entries = rcpath->est_entries;
 		}
 
-		add_path(joinrel, (Path *)
-				 create_nestloop_path(root,
-									  joinrel,
-									  jointype,
-									  &workspace,
-									  extra,
-									  outer_path,
-									  inner_cache_path,
-									  extra->restrictlist,
-									  pathkeys,
-									  required_outer));
-		added_path = true;
+		add_path(joinrel, (Path *)nlpath);
 	}
-	else
-	{
-		ResultCachePath *rcpath = (ResultCachePath *) inner_cache_path;
-
-		/* Waste no memory when we reject a path here */
-		list_free(rcpath->hash_operators);
-		list_free(rcpath->param_exprs);
-		pfree(rcpath);
-	}
-
-	if (!added_path)
-	{
-		/* Waste no memory when we reject a path here */
-		bms_free(required_outer);
-	}
-
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 45e211262a..7afb7741d0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4147,6 +4147,7 @@ create_nestloop_plan(PlannerInfo *root,
 	Relids		outerrelids;
 	List	   *nestParams;
 	Relids		saveOuterRels = root->curOuterRels;
+	List	   *param_exprs = NIL;
 
 	/* NestLoop can project, so no need to be picky about child tlists */
 	outer_plan = create_plan_recurse(root, best_path->outerjoinpath, 0);
@@ -4157,6 +4158,9 @@ create_nestloop_plan(PlannerInfo *root,
 
 	inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0);
 
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
 	/* Restore curOuterRels */
 	bms_free(root->curOuterRels);
 	root->curOuterRels = saveOuterRels;
@@ -4204,6 +4208,54 @@ create_nestloop_plan(PlannerInfo *root,
 							  best_path->jointype,
 							  best_path->inner_unique);
 
+	//bool		paramcache;
+	//int			numKeys;		/* size of the two arrays below */
+
+	//Oid		   *hashOperators;	/* hash operators for each key */
+	//Oid		   *collations;		/* cache keys */
+	//List	   *param_exprs;	/* exprs containing parameters */
+	//bool		singlerow;		/* true if the cache entry should be marked as
+	//							 * complete after we store the first tuple in
+	//							 * it. */
+	//uint32		est_entries;	/* The maximum number of entries that the
+	//							 * planner expects will fit in the cache, or 0
+	//							 * if unknown */
+
+	if (best_path->use_cache)
+	{
+		Oid		   *operators;
+		Oid		   *collations;
+		ListCell   *lc;
+		ListCell   *lc2;
+		int			nkeys;
+		int			i;
+
+		join_plan->numKeys = list_length(best_path->param_exprs);
+
+		nkeys = list_length(param_exprs);
+		Assert(nkeys > 0);
+		operators = palloc(nkeys * sizeof(Oid));
+		collations = palloc(nkeys * sizeof(Oid));
+
+		i = 0;
+		forboth(lc, param_exprs, lc2, best_path->hash_operators)
+		{
+			Expr	   *param_expr = (Expr *)lfirst(lc);
+			Oid			opno = lfirst_oid(lc2);
+
+			operators[i] = opno;
+			collations[i] = exprCollation((Node *)param_expr);
+			i++;
+		}
+		join_plan->paramcache = true;
+		join_plan->param_exprs = param_exprs;
+		join_plan->hashOperators = operators;
+		join_plan->collations = collations;
+		join_plan->singlerow = best_path->singlerow;
+		join_plan->est_entries = best_path->est_entries;
+
+	}
+
 	copy_generic_path_info(&join_plan->join.plan, &best_path->path);
 
 	return join_plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e2c61b0a0..9da223139a 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -137,74 +137,6 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
-
-/*
- * outer_params_hashable
- *		Determine if it's valid to use a ResultCache node to cache already
- *		seen rows matching a given set of parameters instead of performing a
- *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
- *		if all parameters required by this query level can be hashed.  If so,
- *		return true and set 'operators' to the list of hash equality operators
- *		for the given parameters then populate 'param_exprs' with each
- *		PARAM_EXEC parameter that the subplan requires the outer query to pass
- *		it.  When hashing is not possible, false is returned and the two
- *		output lists are unchanged.
- */
-static bool
-outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
-					  List **param_exprs)
-{
-	List	   *oplist = NIL;
-	List	   *exprlist = NIL;
-	ListCell   *lc;
-
-	/* Ensure we're not given a top-level query. */
-	Assert(subroot->parent_root != NULL);
-
-	/*
-	 * It's not valid to use a Result Cache node if there are any volatile
-	 * function in the subquery.  Caching could cause fewer evaluations of
-	 * volatile functions that have side-effects
-	 */
-	if (contain_volatile_functions((Node *) subroot->parse))
-		return false;
-
-	foreach(lc, plan_params)
-	{
-		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
-		TypeCacheEntry *typentry;
-		Node	   *expr = ppi->item;
-		Param	   *param;
-
-		param = makeNode(Param);
-		param->paramkind = PARAM_EXEC;
-		param->paramid = ppi->paramId;
-		param->paramtype = exprType(expr);
-		param->paramtypmod = exprTypmod(expr);
-		param->paramcollid = exprCollation(expr);
-		param->location = -1;
-
-		typentry = lookup_type_cache(param->paramtype,
-									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
-
-		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
-		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
-		{
-			list_free(oplist);
-			list_free(exprlist);
-			return false;
-		}
-
-		oplist = lappend_oid(oplist, typentry->eq_opr);
-		exprlist = lappend(exprlist, param);
-	}
-
-	*operators = oplist;
-	*param_exprs = exprlist;
-
-	return true;				/* all params can be hashed */
-}
-
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -311,30 +243,30 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	 * regardless. It may be useful if we can only do this when it seems
 	 * likely that we'll get some repeat lookups, i.e. cache hits.
 	 */
-	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
-	{
-		List	   *operators;
-		List	   *param_exprs;
-
-		/* Determine if all the subplan parameters can be hashed */
-		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
-		{
-			ResultCachePath *cache_path;
-
-			/*
-			 * Pass -1 for the number of calls since we don't have any ideas
-			 * what that'll be.
-			 */
-			cache_path = create_resultcache_path(root,
-												 best_path->parent,
-												 best_path,
-												 param_exprs,
-												 operators,
-												 false,
-												 -1);
-			best_path = (Path *) cache_path;
-		}
-	}
+	//if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	//{
+	//	List	   *operators;
+	//	List	   *param_exprs;
+
+	//	/* Determine if all the subplan parameters can be hashed */
+	//	if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+	//	{
+	//		ResultCachePath *cache_path;
+
+	//		/*
+	//		 * Pass -1 for the number of calls since we don't have any ideas
+	//		 * what that'll be.
+	//		 */
+	//		cache_path = create_resultcache_path(root,
+	//											 best_path->parent,
+	//											 best_path,
+	//											 param_exprs,
+	//											 operators,
+	//											 false,
+	//											 -1);
+	//		best_path = (Path *) cache_path;
+	//	}
+	//}
 
 	plan = create_plan(subroot, best_path);
 
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
index d2f3ed9a74..440019d141 100644
--- a/src/include/executor/nodeResultCache.h
+++ b/src/include/executor/nodeResultCache.h
@@ -15,16 +15,11 @@
 
 #include "nodes/execnodes.h"
 
-extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecResultCacheFinishScan(ResultCacheState *rcstate);
+extern TupleTableSlot *ExecResultCache(ResultCacheState *rcstate, PlanState *innerPlan);
+extern ResultCacheState *ExecInitResultCache(NestLoop *node, PlanState *planstate, PlanState *cache_planstate);
 extern void ExecEndResultCache(ResultCacheState *node);
 extern void ExecReScanResultCache(ResultCacheState *node);
 extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
-extern void ExecResultCacheEstimate(ResultCacheState *node,
-									ParallelContext *pcxt);
-extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
-										 ParallelContext *pcxt);
-extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
-											ParallelWorkerContext *pwcxt);
-extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
 
 #endif							/* NODERESULTCACHE_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 30f66d5058..2fd1b2461d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1855,12 +1855,17 @@ typedef struct JoinState
  *		NullInnerTupleSlot prepared null tuple for left outer joins
  * ----------------
  */
+struct ResultCacheState;
 typedef struct NestLoopState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	bool		nl_NeedNewOuter;
 	bool		nl_MatchedOuter;
+	bool		nl_ParamCache;
 	TupleTableSlot *nl_NullInnerTupleSlot;
+	struct ResultCacheState *nl_pcache;
+	ProjectionInfo *ps_CacheProjInfo;	/* info for doing tuple projection */
+	ProjectionInfo *ps_ScanProjInfo;	/* info for doing tuple projection */
 } NestLoopState;
 
 /* ----------------
@@ -2022,12 +2027,15 @@ typedef struct SharedResultCacheInfo
  */
 typedef struct ResultCacheState
 {
-	ScanState	ss;				/* its first field is NodeTag */
+	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
+	//ScanState	ss;				/* its first field is NodeTag */
 	int			rc_status;		/* value of ExecResultCache's state machine */
 	int			nkeys;			/* number of hash table keys */
 	struct resultcache_hash *hashtable; /* hash table cache entries */
 	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
 	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *cachefoundslot; /* Slot to return found cache entries */
+	TupleTableSlot *cachefoundslotmin; /* Slot to return found cache entries */
 	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
 	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
 	ExprState **param_exprs;	/* exprs containing the parameters to this
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 79a4ad20dd..31b158026c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1546,6 +1546,16 @@ typedef struct JoinPath
 
 	List	   *joinrestrictinfo;	/* RestrictInfos to apply to join */
 
+	bool		use_cache;
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+
 	/*
 	 * See the notes for RelOptInfo and ParamPathInfo to understand why
 	 * joinrestrictinfo is needed in JoinPath, and can't be merged into the
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index ac5685da64..f989d31033 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -701,6 +701,18 @@ typedef struct NestLoop
 {
 	Join		join;
 	List	   *nestParams;		/* list of NestLoopParam nodes */
+	bool		paramcache;
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
 } NestLoop;
 
 typedef struct NestLoopParam

#49

Alvaro Herrera

alvherre@2ndquadrant.com

over 5 years ago

In reply to: David Rowley (#48)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On 2020-Sep-02, David Rowley wrote:

v7 (Separate Result Cache node)
Query 1:
Execution Time: 894.003 ms

Query 2:
Execution Time: 854.950 ms

v7 + hacks_V3 (caching done in Nested Loop)
Query 1:
Execution Time: 770.470 ms

Query 2
Execution Time: 779.181 ms

Wow, this is a *significant* change.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#50

[1]: /messages/by-id/CAApHDvrX9o35_WUoL5c5arJ0XbJFN-cDHckjL57-PR-Keeypdw@mail.gmail.com

dgrowleyml@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#49)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 3 Sep 2020 at 01:49, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Sep-02, David Rowley wrote:

v7 (Separate Result Cache node)
Query 1:
Execution Time: 894.003 ms

Query 2:
Execution Time: 854.950 ms

v7 + hacks_V3 (caching done in Nested Loop)
Query 1:
Execution Time: 770.470 ms

Query 2
Execution Time: 779.181 ms

Wow, this is a *significant* change.

Yeah, it's more than I thought it was going to be. It seems I
misthought in [1]/messages/by-id/CAApHDvrX9o35_WUoL5c5arJ0XbJFN-cDHckjL57-PR-Keeypdw@mail.gmail.com where I mentioned:

With a hit ratio of 90% we'll
only pull 10% of tuples through the additional node, so that's about
1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
might become hard to measure above the noise. More costly inner scans
will have the planner choose to Result Cache with lower estimated hit
ratios, but in that case, pulling the tuple through the additional
node during a cache miss will be less noticeable due to the more
costly inner side of the join.

This wasn't technically wrong. I just failed to consider that a cache
hit when the cache is built into Nested Loop requires looking at no
other node. The tuples are right there in the cache, 90% of the time,
in this example. No need to execute any nodes to get at them.

I have come around a bit to Andres' idea. But we'd need to display the
nested loop node as something like "Cacheable Nested Loop" in EXPLAIN
so that we could easily identify what's going on. Not sure if the word
"Hash" would be better to inject in the name somewhere rather than
"Cacheable".

I've not done any further work to shift the patch any further in that
direction yet. I know it's going to be quite a bit of work and it
sounds like there are still objections in both directions. I'd rather
everyone agreed on something before I go to the trouble of trying to
make something committable with Andres' way.

Tom, I'm wondering if you'd still be against this if Nested Loop
showed a different name in EXPLAIN when it was using caching? Or are
you also concerned about adding unrelated code into nodeNestloop.c?
If so, I'm wondering if adding a completely new node like
nodeNestcacheloop.c. But that's going to add lots of boilerplate code
that we'd get away with not having otherwise.

In the meantime, I did change a couple of things with the current
separate node version. It's just around how the path stuff works in
the planner. I'd previously modified try_nestloop_path() to try a
Result Cache, but I noticed more recently that's not how it's done for
Materialize. So in the attached, I've just aligned it to how
non-parameterized Nested Loops with a Materialized inner side work.

David

Attachments:

v8-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchapplication/octet-stream; name=v8-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From 402a49142d412b4d47681570dcc21abb4a9451b7 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v8 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..0ee5d8f10a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2959,7 +2959,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index cd3716d494..3d7f235645 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1864,7 +1864,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 139c5e3dc2..bd7f7d4e1a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3719,7 +3719,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3744,7 +3745,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3760,7 +3762,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4778,7 +4780,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 745f443e5c..f33033bc27 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c1fc866cbf..e528e05459 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 00c7afc66f..2f1c1b8ec4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3365,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3373,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3580,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.21.0.windows.1

v8-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchapplication/octet-stream; name=v8-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From 14c271b0fade560a020a854354fcbb2ff0179b47 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v8 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..dc1f1df07e 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.21.0.windows.1

v8-0003-Add-Result-Cache-executor-node.patchapplication/octet-stream; name=v8-0003-Add-Result-Cache-executor-node.patchDownload

From 39fe858bdf9cade269088ac70ad2745faac772a0 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v8 3/3] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.

We also support caching the results from correlated subqueries.  However,
currently, since subqueries are planned before their parent query, we are
unable to obtain any estimations on the cache hit ratio.  For now, we opt
to just always put a Result Cache above a suitable correlated subquery. In
the future, we may like to be smarter about that, but for now, the
overhead of using the Result Cache, even in cases where we never get a
cache hit is minimal.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   51 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  119 +-
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  132 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1122 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  149 +++
 src/backend/optimizer/path/joinpath.c         |  234 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    1 +
 src/backend/optimizer/plan/subselect.c        |  110 ++
 src/backend/optimizer/util/pathnode.c         |   70 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    6 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/lib/simplehash.h                  |    8 +-
 src/include/nodes/execnodes.h                 |   67 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/include/utils/selfuncs.h                  |    6 +-
 src/test/regress/expected/aggregates.out      |    8 +-
 src/test/regress/expected/groupingsets.out    |   20 +-
 src/test/regress/expected/join.out            |  129 +-
 src/test/regress/expected/join_hash.out       |   72 +-
 src/test/regress/expected/partition_prune.out |  237 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/rowsecurity.out     |   20 +-
 src/test/regress/expected/select_parallel.out |   28 +-
 src/test/regress/expected/subselect.out       |   44 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    2 +
 src/test/regress/sql/resultcache.sql          |   54 +
 49 files changed, 2935 insertions(+), 261 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84bc0ee381..a6d7fbb0e5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1581,6 +1581,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1607,6 +1608,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2118,22 +2120,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2914,10 +2919,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2928,8 +2936,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2939,11 +2947,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d452d06343..172133005e 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -480,10 +480,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c4ba49ffaf..6b7f747d62 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4737,6 +4737,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5547..6d4b9eb3b9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1267,6 +1269,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1958,6 +1963,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3031,6 +3040,114 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+
+	initStringInfo(&keystr);
+
+	/* XXX surely we'll always have more than one if we have a resultcache? */
+	useprefix = list_length(es->rtable) > 1;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
@@ -3097,7 +3214,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 			if (aggstate->hash_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-					aggstate->hash_disk_used);
+								 aggstate->hash_disk_used);
 			}
 		}
 
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..68920ecd89 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..5e30623ad1 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3487,3 +3487,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..459e9dd3e9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -293,6 +294,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -513,6 +518,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -989,6 +998,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1058,6 +1068,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1350,6 +1363,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 01b7b926bf..fbbe667cc1 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..09b25ea184
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1122 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- materialize the result of a subplan
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+  * ResultCacheTuple Stores an individually cached tuple
+  */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* The cache entry is void of any tuples. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.  XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = work_mem * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the planner failed to come up with a better value.
+	 */
+	build_hash_table(rcstate, node->est_entries > 0 ? node->est_entries :
+					 1024);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40b82..74101d5f7f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -927,6 +927,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4936,6 +4963,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..27cc4c1864 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -836,6 +836,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3809,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4043,6 +4076,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..d5931b1651 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..543e2ba93b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4109,6 +4109,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d7f235645..f8046de043 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -132,6 +133,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2296,6 +2298,148 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+	int			flags;
+
+	double		work_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	work_mem_bytes = work_mem * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(work_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free, so here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache, so we don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4036,6 +4180,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..239f400ebd 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,195 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos((Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1671,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1680,24 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+				{
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
+				}
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1852,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1877,20 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+			{
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
+			}
+
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..45e211262a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1516,6 +1529,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6341,6 +6404,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6929,6 +7014,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6974,6 +7060,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e946..a7af7dbed2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -679,6 +679,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_ResultCache:
 		case T_Unique:
 		case T_SetOp:
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6eb794669f..3e2c61b0a0 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -136,6 +137,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -233,6 +302,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *cache_path;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			cache_path = create_resultcache_path(root,
+												 best_path->parent,
+												 best_path,
+												 param_exprs,
+												 operators,
+												 false,
+												 -1);
+			best_path = (Path *) cache_path;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
@@ -2718,6 +2821,13 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			/* XXX Check this is correct */
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e528e05459..6cf18a6803 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3849,6 +3898,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4067,6 +4127,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 596bcb7b84..60070ea76b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of caching results from parameterized plan nodes."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..6bca3dfc9f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..48dd235bfd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -263,6 +263,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index dc1f1df07e..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -921,11 +921,11 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 	tb->members--;
 
 	/*
-	 * Backward shift following elements till either an empty element
-	 * or an element at its optimal position is encountered.
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
 	 *
-	 * While that sounds expensive, the average chain length is short,
-	 * and deletions would otherwise require tombstones.
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
 	 */
 	while (true)
 	{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..30f66d5058 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1984,6 +1985,72 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct resultcache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* memory limit in bytes for the cache */
+	uint64		mem_lowerlimit; /* reduce memory usage to below this when we
+								 * free up space */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..94ab62f318 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -241,6 +243,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..79a4ad20dd 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..ac5685da64 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..21d3dbdad4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..816fb3366f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 455e1343ee..57ca9fda8d 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
  * callers to provide further details about some assumptions which were made
  * during the estimation.
  */
-#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
-											  * the DEFAULTs as defined above.
-											  */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -2577,6 +2579,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2595,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 701d52b465..2256f6da67 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -774,19 +774,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..fec710e411 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -2973,8 +2975,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2984,11 +2986,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
@@ -3510,8 +3514,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3521,17 +3525,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3541,9 +3547,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4068,11 +4076,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
+   ->  Result Cache
          Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4109,15 +4120,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
-         ->  Limit
+         ->  Result Cache
                Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
          Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+         Cache Key: (i8.q1), t2.f1
+         ->  Limit
+               Output: ((i8.q1)), (t2.f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: (i8.q1), t2.f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4163,14 +4180,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4808,34 +4828,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4890,14 +4916,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..5c826792f5 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
-                 Output: (hjtest_1.b * 5)
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
-                       Output: (hjtest_2.c * 5)
+                 ->  Result Cache
+                       Output: ((hjtest_2.c * 5))
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
-           Output: (hjtest_1.b * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_1.b * 5))
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
-                 Output: (hjtest_2.c * 5)
+           ->  Result Cache
+                 Output: ((hjtest_2.c * 5))
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
+                 ->  Result Cache
+                       Output: ((hjtest_1.b * 5))
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result Cache
+                 Output: ((hjtest_1.b * 5))
+                 Cache Key: hjtest_1.b
                  ->  Result
                        Output: (hjtest_1.b * 5)
-         SubPlan 2
-           ->  Result
-                 Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
-           Output: (hjtest_2.c * 5)
-(28 rows)
+     ->  Result Cache
+           Output: ((hjtest_2.c * 5))
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 50d2a7e4b9..bab3b6401b 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2060,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2070,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2107,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2143,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2179,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2216,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..14e163a06f
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 5339  Misses: 4661  Evictions: 4056  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=4661)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=4661)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
+RESET work_mem;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.hundred = t1.hundred)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Gather (actual rows=1000 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+         Recheck Cond: (unique1 < 1000)
+         Heap Blocks: exact=333
+         ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+               Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.hundred
+           Hits: 900  Misses: 100  Evictions: 0  Overflows: 0
+           ->  Aggregate (actual rows=1 loops=100)
+                 ->  Index Only Scan using tenk1_hundred on tenk1 t2 (actual rows=100 loops=100)
+                       Index Cond: (hundred = t1.hundred)
+                       Heap Fetches: 0
+(16 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: 0  Overflows: 0
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+RESET enable_hashjoin;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..a3caf95c8d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1168,9 +1172,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index b81923f2e7..baf778d95c 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -921,19 +921,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
-           Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+     ->  Result Cache
+           Output: (now())
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
@@ -1044,19 +1050,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..2aa5cc5125 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -96,10 +96,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..317cd56eb2 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..04f0473b92 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -198,6 +198,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..b0bc88140f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1e904a8c5b..5ca0bcf238 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,8 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..52f614bdd4
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,54 @@
+-- Perform tests on the Result Cache node.
+
+-- Ensure we get the expected plan with sub plans.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+-- Ensure we get some evitions.  The number is likely to vary on different machines, so
+-- XXX I'll likely need to think about how to check this better.
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;
+RESET work_mem;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.hundred = t1.hundred)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+RESET enable_hashjoin;
-- 
2.21.0.windows.1

#51

[1]: /messages/by-id/CAApHDvpDdQDFSM+u19ROinT0qw41OX=MW4-B2mO003v6-X0AjA@mail.gmail.com

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#50)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 15 Sep 2020 at 12:58, David Rowley <dgrowleyml@gmail.com> wrote:

I've not done any further work to shift the patch any further in that
direction yet. I know it's going to be quite a bit of work and it
sounds like there are still objections in both directions. I'd rather
everyone agreed on something before I go to the trouble of trying to
make something committable with Andres' way.

I spent some time converting the existing v8 to move the caching into
the Nested Loop node instead of having an additional Result Cache node
between the Nested Loop and the inner index scan. To minimise the size
of this patch I've dropped support for caching Subplans, for now.

I'd say the quality of this patch is still first draft. I just spent
today getting some final things working again and spent a few hours
trying to break it then another few hours running benchmarks on it and
comparing it to the v8 patch, (v8 uses a separate Result Cache node).

I'd say most of the patch is pretty good, but the changes I've made in
nodeNestloop.c will need to be changed a bit. All the caching logic
is in a new file named execMRUTupleCache.c. nodeNestloop.c is just a
consumer of this. It can detect if the MRUTupleCache was a hit or a
miss depending on which slot the tuple is returned in. So far I'm just
using that to switch around the projection info and join quals for the
ones I initialised to work with the MinimalTupleSlot from the cache.
I'm not yet sure exactly how this should be improved, I just know
what's there is not so great.

So far benchmarking shows there's still a regression from the v8
version of the patch. This is using count(*). An earlier test [1]/messages/by-id/CAApHDvpDdQDFSM+u19ROinT0qw41OX=MW4-B2mO003v6-X0AjA@mail.gmail.com did
show speedups when we needed to deform tuples returned by the nested
loop node. I've not yet repeated that test again. I was disappointed
to see v9 slower than v8 after having spent about 3 days rewriting the
patch

The setup for the test I did was:

create table hundredk (hundredk int, tenk int, thousand int, hundred
int, ten int, one int);
insert into hundredk select x%100000,x%10000,x%1000,x%100,x%10,1 from
generate_Series(1,100000) x;
create table lookup (a int);
insert into lookup select x from generate_Series(1,100000)x,
generate_Series(1,100);
create index on lookup(a);
vacuum analyze lookup, hundredk;

I then ran a query like;
select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;

in pgbench for 60 seconds and then again after swapping the join
column to hk.hundred, hk.ten and hk.one so that fewer index lookups
were performed and more cache hits were seen.

I did have enable_mergejoin = off when testing v8 and v9 on this test.
The planner seemed to favour merge join over nested loop without that.

Results in hundred_rows_per_rescan.png.

I then reduced the lookup table so it only has 1 row to lookup instead
of 100 for each value.

truncate lookup;
insert into lookup select x from generate_Series(1,100000)x;
vacuum analyze lookup;

and ran the tests again. Results in one_row_per_rescan.png.

I also wanted to note that these small scale tests are not the best
case for this patch. I've seen much more significant gains when an
unpatched Hash join's hash table filled the L3 cache and started
having to wait for RAM. Since my MRU cache was much smaller than the
Hash join's hash table, it performed about 3x faster. What I'm trying
to focus on here is the regression from v8 to v9. It seems to cast a
bit more doubt as to whether v9 is any better than v8.

I really would like to start moving this work towards a commit in the
next month or two. So any comments about v8 vs v9 would be welcome as
I'm still uncertain which patch is best to pursue.

David

Attachments:

v9-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchtext/plain; charset=US-ASCII; name=v9-0001-Allow-estimate_num_groups-to-pass-back-further-de.patchDownload

From d079c58d315851e0613b745aeb4ff474321c8458 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v9 1/3] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 21 ++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 11 ++++++++++-
 8 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..90bf40438b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2961,7 +2961,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 733f7ea543..a0877e2be4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 986d7a52e3..8d36ab8129 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3715,7 +3715,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3740,7 +3741,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3756,7 +3758,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4801,7 +4803,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 745f443e5c..f33033bc27 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 5281a2f998..138a353f93 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index bec357fcef..3f98f14405 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	flags - When passed as non-NULL, the function sets bits in this
+ *		parameter to provide further details to callers about some
+ *		assumptions which were made when performing the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3365,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, int *flags)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3373,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the flags output parameter, if set */
+	if (flags != NULL)
+		*flags = 0;
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3580,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better mark
+					 * that fact in the selectivity flags variable.
+					 */
+					if (flags != NULL && varinfo2->isdefault)
+						*flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 7ac4a06391..455e1343ee 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,14 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
+											  * the DEFAULTs as defined above.
+											  */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +202,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  int *flags);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.21.0.windows.1

v9-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchtext/plain; charset=US-ASCII; name=v9-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patchDownload

From beef3a998ce70fc76d9df8dc197434a8bdc43e3f Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v9 2/3] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..dc1f1df07e 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element
+	 * or an element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short,
+	 * and deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.21.0.windows.1

v9-0003-Allow-parameterized-Nested-Loops-to-cache-tuples-.patchtext/plain; charset=US-ASCII; name=v9-0003-Allow-parameterized-Nested-Loops-to-cache-tuples-.patchDownload

From a6a278ddeca2c3369b80ffe02f18494531011c5f Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 20 Oct 2020 13:36:48 +1300
Subject: [PATCH v9 3/3] Allow parameterized Nested Loops to cache tuples from
 inner scans

Traditionally a parameterized nested loop would always perform another
inner scan each time the parameter values for the scan changed.  This was
quite wasteful when we had repeat lookups for the same value again and
again.

Here we add support to allow nested loops to remember the resulting tuples
from a scan and reuse those if we see the same parameter values on a
subsequent scan.

These results are stored within a hash table, the size of which is limited
by hash_mem.  When the cache becomes full, the least recently looked up
entries are evicted from the cache to make way for new tuples.

In the query plan, these appear as "Cached Nested Loops".
---
 .../postgres_fdw/expected/postgres_fdw.out    |  71 +-
 doc/src/sgml/config.sgml                      |  19 +
 src/backend/commands/explain.c                | 146 ++-
 src/backend/executor/Makefile                 |   1 +
 src/backend/executor/execExpr.c               | 132 +++
 src/backend/executor/execMRUTupleCache.c      | 981 ++++++++++++++++++
 src/backend/executor/execParallel.c           |  17 +
 src/backend/executor/nodeNestloop.c           | 213 +++-
 src/backend/nodes/copyfuncs.c                 |   8 +-
 src/backend/nodes/outfuncs.c                  |   7 +
 src/backend/nodes/readfuncs.c                 |   7 +
 src/backend/optimizer/path/costsize.c         | 266 ++++-
 src/backend/optimizer/path/joinpath.c         | 437 ++++++++
 src/backend/optimizer/plan/createplan.c       |  60 +-
 src/backend/optimizer/util/pathnode.c         | 144 ++-
 src/backend/utils/adt/ruleutils.c             |   7 +-
 src/backend/utils/misc/guc.c                  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/execMRUTupleCache.h      |  97 ++
 src/include/executor/executor.h               |   6 +
 src/include/executor/nodeNestloop.h           |   5 +
 src/include/lib/ilist.h                       |  19 +
 src/include/lib/simplehash.h                  |   8 +-
 src/include/nodes/execnodes.h                 |   9 +
 src/include/nodes/pathnodes.h                 |  34 +-
 src/include/nodes/plannodes.h                 |  13 +
 src/include/optimizer/cost.h                  |  13 +-
 src/include/optimizer/pathnode.h              |  14 +
 src/include/utils/selfuncs.h                  |   6 +-
 src/test/regress/expected/join.out            | 131 ++-
 src/test/regress/expected/partition_prune.out |  33 +-
 src/test/regress/expected/subselect.out       |   5 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/join.sql                 |  38 +
 src/test/regress/sql/partition_prune.sql      |   3 +
 35 files changed, 2828 insertions(+), 136 deletions(-)
 create mode 100644 src/backend/executor/execMRUTupleCache.c
 create mode 100644 src/include/executor/execMRUTupleCache.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..dd72764b36 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2114,8 +2114,9 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
-   ->  Nested Loop
+   ->  Cached Nested Loop
          Output: t1."C 1"
+         Cache Key: t1.c2
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
          ->  HashAggregate
@@ -2125,7 +2126,7 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
                      Output: t2.c1, t3.c1
                      Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
                      Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+(14 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -7360,26 +7361,27 @@ analyze loct1;
 -- inner join; expressions in the clauses appear in the equivalence class list
 explain (verbose, costs off)
 	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
-                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
  Limit
    Output: foo.f1, loct1.f1, foo.f2
-   ->  Sort
+   ->  Cached Nested Loop
          Output: foo.f1, loct1.f1, foo.f2
-         Sort Key: foo.f2
-         ->  Merge Join
-               Output: foo.f1, loct1.f1, foo.f2
-               Merge Cond: (foo.f1 = loct1.f1)
-               ->  Merge Append
-                     Sort Key: foo.f1
-                     ->  Index Scan using i_foo_f1 on public.foo foo_1
+         Cache Key: foo.f1
+         ->  Merge Append
+               Sort Key: foo.f2
+               ->  Sort
+                     Output: foo_1.f1, foo_1.f2
+                     Sort Key: foo_1.f2
+                     ->  Seq Scan on public.foo foo_1
                            Output: foo_1.f1, foo_1.f2
-                     ->  Foreign Scan on public.foo2 foo_2
-                           Output: foo_2.f1, foo_2.f2
-                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC NULLS LAST
-               ->  Index Only Scan using i_loct1_f1 on public.loct1
-                     Output: loct1.f1
-(17 rows)
+               ->  Foreign Scan on public.foo2 foo_2
+                     Output: foo_2.f1, foo_2.f2
+                     Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f2 ASC NULLS LAST
+         ->  Index Only Scan using i_loct1_f1 on public.loct1
+               Output: loct1.f1
+               Index Cond: (loct1.f1 = foo.f1)
+(18 rows)
 
 select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
  f1 | f1 
@@ -7400,26 +7402,27 @@ select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.
 -- list but no output change as compared to the previous query
 explain (verbose, costs off)
 	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
-                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
+                                         QUERY PLAN                                         
+--------------------------------------------------------------------------------------------
  Limit
    Output: foo.f1, loct1.f1, foo.f2
-   ->  Sort
+   ->  Cached Nested Loop Left Join
          Output: foo.f1, loct1.f1, foo.f2
-         Sort Key: foo.f2
-         ->  Merge Left Join
-               Output: foo.f1, loct1.f1, foo.f2
-               Merge Cond: (foo.f1 = loct1.f1)
-               ->  Merge Append
-                     Sort Key: foo.f1
-                     ->  Index Scan using i_foo_f1 on public.foo foo_1
+         Cache Key: foo.f1
+         ->  Merge Append
+               Sort Key: foo.f2
+               ->  Sort
+                     Output: foo_1.f1, foo_1.f2
+                     Sort Key: foo_1.f2
+                     ->  Seq Scan on public.foo foo_1
                            Output: foo_1.f1, foo_1.f2
-                     ->  Foreign Scan on public.foo2 foo_2
-                           Output: foo_2.f1, foo_2.f2
-                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC NULLS LAST
-               ->  Index Only Scan using i_loct1_f1 on public.loct1
-                     Output: loct1.f1
-(17 rows)
+               ->  Foreign Scan on public.foo2 foo_2
+                     Output: foo_2.f1, foo_2.f2
+                     Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f2 ASC NULLS LAST
+         ->  Index Only Scan using i_loct1_f1 on public.loct1
+               Output: loct1.f1
+               Index Cond: (loct1.f1 = foo.f1)
+(18 rows)
 
 select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
  f1 | f1 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..5a82e4e7ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4718,6 +4718,25 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-cachednestloop" xreflabel="enable_cachednestloop">
+      <term><varname>enable_cachednestloop</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_cachednestloop</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's ability to use a cached
+        parameterized nested loop joins.  Such joins allow the inner
+        parameterized scan of a nested loop join to be cached so that repeat
+        lookups are likely to find the tuples already cached rather than have
+        to perform another inner scan.  Less commonly looked up results may be
+        evicted from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
       <term><varname>enable_gathermerge</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1837..a3387959f3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
 #include "commands/createas.h"
 #include "commands/defrem.h"
 #include "commands/prepare.h"
+#include "executor/execMRUTupleCache.h"
 #include "executor/nodeHash.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
@@ -108,6 +109,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_cachednestloop_info(NestLoopState *nlstate, List *ancestors,
+									 ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1170,7 +1173,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = sname = "BitmapOr";
 			break;
 		case T_NestLoop:
-			pname = sname = "Nested Loop";
+			if (((NestLoop *) plan)->mrucache)
+				pname = sname = "Cached Nested Loop";
+			else
+				pname = sname = "Nested Loop";
 			break;
 		case T_MergeJoin:
 			pname = "Merge";	/* "Join" gets added by jointype switch */
@@ -1875,6 +1881,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			break;
 		case T_NestLoop:
+			if (((NestLoop *) plan)->mrucache)
+				show_cachednestloop_info((NestLoopState *) planstate, ancestors, es);
+
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
 			if (((NestLoop *) plan)->join.joinqual)
@@ -3028,6 +3037,141 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+static void
+show_cachednestloop_info(NestLoopState *nlstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) nlstate)->plan;
+	MRUTupleCache *mrucache = nlstate->nl_mrucache;
+
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	if (mrucache == NULL)
+		return;
+
+	initStringInfo(&keystr);
+
+	useprefix = list_length(es->rtable) > 1;
+
+	ancestors = lcons(plan, ancestors);
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((NestLoop *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	ancestors = list_delete_first(ancestors);
+
+	if (!es->analyze)
+		return;
+
+	if (mrucache->stats.mem_peak > 0)
+		memPeakKb = (mrucache->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (mrucache->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, mrucache->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, mrucache->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, mrucache->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, mrucache->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 mrucache->stats.cache_hits,
+						 mrucache->stats.cache_misses,
+						 mrucache->stats.cache_evictions,
+						 mrucache->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (nlstate->shared_info != NULL)
+	{
+		for (int n = 0; n < nlstate->shared_info->num_workers; n++)
+		{
+			MRUCacheInstrumentation *si;
+
+			si = &nlstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's MRUTupleCache.mem_used field is unavailable
+			 * to us, ExecEndNestLoop will have set the
+			 * MRUCacheInstrumentation.mem_peak field for us.  No need to do
+			 * the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows, memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB",memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..e33e8f2f28 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	execIndexing.o \
 	execJunk.o \
 	execMain.o \
+	execMRUTupleCache.o \
 	execParallel.o \
 	execPartition.o \
 	execProcnode.o \
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 868f8b0858..fdc94b9914 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3470,3 +3470,135 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * ops: the slot ops for the inner/outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *ops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = ops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execMRUTupleCache.c b/src/backend/executor/execMRUTupleCache.c
new file mode 100644
index 0000000000..3553dc26cb
--- /dev/null
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -0,0 +1,981 @@
+/*-------------------------------------------------------------------------
+ *
+ * execMRUTupleCache.c
+ *	  Routines setting up and using a most-recently-used cache to store sets
+ *	  of tuples for a given cache key.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/execMRUTupleCache.c
+ *
+ * A set of functions for setting up and using a most-recently-used tuple
+ * cache.  Sets of tuples are stored by the cache key and are located in RAM.
+ * When we're asked cache tuples that would cause us to exceed the memory
+ * limits which are imposed by the caller, the least recently looked up cache
+ * entry is evicted from cache to make way for the new entry.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/execMRUTupleCache.h"
+#include "executor/executor.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the MRUTupleCache's state machine
+ */
+#define MRUCACHE_LOOKUP				1	/* Attempt to find the first tuple for
+										 * a given key */
+#define MRUCACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define MRUCACHE_FILLING			3	/* Read next tuple to fill cache */
+#define MRUCACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * plan without caching anything */
+#define MRUCACHE_ENDOFSCAN			5	/* Ready for rescan */
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(MRUCacheEntry) + \
+										 sizeof(MRUCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(MRUCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+  * MRUCacheTuple Stores an individually cached tuple
+  */
+typedef struct MRUCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct MRUCacheTuple *next;	/* The next tuple with the same parameter
+								 * values or NULL if it's the last one */
+} MRUCacheTuple;
+
+/*
+ * MRUCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct MRUCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} MRUCacheKey;
+
+/*
+ * MRUCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct MRUCacheEntry
+{
+	MRUCacheKey *key;			/* Hash key for hash table lookups */
+	MRUCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+								 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Were all required tuples read from the
+								 * plan? */
+} MRUCacheEntry;
+
+
+#define SH_PREFIX mrucache
+#define SH_ELEMENT_TYPE MRUCacheEntry
+#define SH_KEY_TYPE MRUCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 MRUCacheHash_hash(struct mrucache_hash *tb,
+								const MRUCacheKey *key);
+static int	MRUCacheHash_equal(struct mrucache_hash *tb,
+							   const MRUCacheKey *params1,
+							   const MRUCacheKey *params2);
+
+#define SH_PREFIX mrucache
+#define SH_ELEMENT_TYPE MRUCacheEntry
+#define SH_KEY_TYPE MRUCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) MRUCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) MRUCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * MRUCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the MRUTupleCache's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+MRUCacheHash_hash(struct mrucache_hash *tb, const MRUCacheKey *key)
+{
+	MRUTupleCache *mrucache = (MRUTupleCache *) tb->private_data;
+	TupleTableSlot *pslot = mrucache->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = mrucache->nkeys;
+	FmgrInfo   *hashfunctions = mrucache->hashfunctions;
+	Oid		   *collations = mrucache->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * MRUCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the MRUTupleCache's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+MRUCacheHash_equal(struct mrucache_hash *tb, const MRUCacheKey *key1,
+					  const MRUCacheKey *key2)
+{
+	MRUTupleCache *mrucache = (MRUTupleCache *) tb->private_data;
+	ExprContext *econtext = mrucache->ps_ExprContext;
+	TupleTableSlot *tslot = mrucache->tableslot;
+	TupleTableSlot *pslot = mrucache->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(mrucache->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(MRUTupleCache *mrucache, uint32 size)
+{
+	/* mrucache_create will convert the size to a power of 2 */
+	mrucache->hashtable = mrucache_create(mrucache->tableContext, size,
+										  mrucache);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate mrucache's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by
+ *		evalulating mrucache's param_exprs.
+ */
+static inline void
+prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key)
+{
+	TupleTableSlot *pslot = mrucache->probeslot;
+	TupleTableSlot *tslot = mrucache->tableslot;
+	int				numKeys = mrucache->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i],
+												mrucache->ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from a cache entry, leaving an empty cache entry.
+ *		Also update memory accounting to reflect the removal of the tuples.
+ */
+static void
+entry_purge_tuples(MRUTupleCache *mrucache, MRUCacheEntry *entry)
+{
+	MRUCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		MRUCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	mrucache->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(MRUTupleCache *mrucache, MRUCacheEntry *entry)
+{
+	MRUCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(mrucache, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	mrucache->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(mrucache->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* XXX I don't really plan on keeping this */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < mrucache->hashtable->size; i++)
+		{
+			MRUCacheEntry *entry = &mrucache->hashtable->data[i];
+
+			if (entry->status == mrucache_SH_IN_USE)
+			{
+
+				MRUCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == mrucache->hashtable->members);
+		Assert(mem == mrucache->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	mrucache_delete_item(mrucache->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		MRUTupleCache's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(MRUTupleCache *mrucache, MRUCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (mrucache->mem_used > mrucache->stats.mem_peak)
+		mrucache->stats.mem_peak = mrucache->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(mrucache->mem_used > mrucache->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &mrucache->lru_list)
+	{
+		MRUCacheKey *key = dlist_container(MRUCacheKey, lru_node, iter.cur);
+		MRUCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(mrucache, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = mrucache_lookup(mrucache->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkeys' as the keys for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(mrucache, entry);
+
+		mrucache->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (mrucache->mem_used <= mrucache->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static MRUCacheEntry *
+cache_lookup(MRUTupleCache *mrucache, bool *found)
+{
+	MRUCacheKey *key;
+	MRUCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(mrucache, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses mrucache's probeslot, which we populated above.
+	 */
+	entry = mrucache_insert(mrucache->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&mrucache->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mrucache->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (MRUCacheKey *) palloc(sizeof(MRUCacheKey));
+	key->params = ExecCopySlotMinimalTuple(mrucache->probeslot);
+
+	/* Update the total cache memory utilization */
+	mrucache->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&mrucache->lru_list, &entry->key->lru_node);
+
+	mrucache->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (mrucache->mem_used > mrucache->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(mrucache, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != mrucache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(mrucache, key);
+
+			/* Re-find the newly added entry */
+			entry = mrucache_lookup(mrucache->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the mrucache's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		mrucache's last_tuple field must point to the tail of
+ *		mrucache->entry's list of tuples.
+ */
+static bool
+cache_store_tuple(MRUTupleCache *mrucache, TupleTableSlot *slot)
+{
+	MRUCacheTuple *tuple;
+	MRUCacheEntry *entry = mrucache->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(mrucache->tableContext);
+
+	tuple = (MRUCacheTuple *) palloc(sizeof(MRUCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	mrucache->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		/* XXX use slist? */
+		mrucache->last_tuple->next = tuple;
+	}
+
+	mrucache->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (mrucache->mem_used > mrucache->mem_upperlimit)
+	{
+		MRUCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(mrucache, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != mrucache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(mrucache, key);
+
+			/* Re-find the entry */
+			mrucache->entry = entry = mrucache_lookup(mrucache->hashtable,
+													  NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Caller to call this after it finishes a parameterized scan
+ */
+void
+ExecMRUTupleCacheFinishScan(MRUTupleCache *mrucache)
+{
+	mrucache->state = MRUCACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	mrucache->entry = NULL;
+	mrucache->last_tuple = NULL;
+}
+
+TupleTableSlot *
+ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
+{
+	PlanState *plan = mrucache->subplan;
+	TupleTableSlot *slot;
+
+	switch (mrucache->state)
+	{
+		case MRUCACHE_LOOKUP:
+			{
+				MRUCacheEntry *entry;
+				bool		found;
+
+				Assert(mrucache->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the plan will return for these
+				 * parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the plan.  If we find one
+				 * there, we'll try to cache it.
+				 */
+
+				 /* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(mrucache, &found);
+
+				if (found && entry->complete)
+				{
+					mrucache->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * MRUCACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					mrucache->last_tuple = entry->tuplehead;
+					mrucache->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE;
+
+						ExecClearTuple(mrucache->cachefoundslot);
+						slot = mrucache->cachefoundslot;
+						ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
+						return slot;
+					}
+					else
+					{
+						/* The cache entry is void of any tuples. */
+						mrucache->state = MRUCACHE_ENDOFSCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *slot;
+
+					mrucache->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the subplan will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(mrucache, entry);
+					}
+
+					/* Scan the subplan for a tuple to cache */
+					slot = ExecProcNode(plan);
+					if (TupIsNull(slot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						mrucache->state = MRUCACHE_ENDOFSCAN;
+						return NULL;
+					}
+
+					mrucache->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+						!cache_store_tuple(mrucache, slot)))
+					{
+						mrucache->stats.cache_overflows += 1;	/* stats update */
+
+						mrucache->state = MRUCACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = mrucache->singlerow;
+						mrucache->state = MRUCACHE_FILLING;
+					}
+
+					return slot;
+				}
+			}
+
+		case MRUCACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(mrucache->entry != NULL);
+				Assert(mrucache->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				mrucache->last_tuple = mrucache->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (mrucache->last_tuple == NULL)
+				{
+					mrucache->state = MRUCACHE_ENDOFSCAN;
+					return NULL;
+				}
+
+				ExecClearTuple(mrucache->cachefoundslot);
+				slot = mrucache->cachefoundslot;
+				ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
+				return slot;
+			}
+
+		case MRUCACHE_FILLING:
+			{
+				TupleTableSlot *slot;
+				MRUCacheEntry *entry = mrucache->entry;
+
+				/*
+				 * entry should already have been set in the MRUCACHE_LOOKUP
+				 * state.
+				 */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the MRUCACHE_FILLING state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				slot = ExecProcNode(plan);
+				if (TupIsNull(slot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					mrucache->state = MRUCACHE_ENDOFSCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.  XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(mrucache, slot)))
+					{
+						/* Couldn't store it?  Handle overflow. */
+						mrucache->stats.cache_overflows += 1;	/* stats update */
+
+						mrucache->state = MRUCACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					return slot;
+				}
+			}
+
+		case MRUCACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *slot;
+
+				/*
+				 * We end up in bypass mode when we're unable to fit all of
+				 * the tuples for a given key in the cache, despite evicting
+				 * everything else from the cache.
+				 *
+				 * We just continue to read tuples without caching.  We need
+				 * to wait until the next rescan before we can come out of
+				 * this mode. Perhaps the tuples for the next lookup key will
+				 * fit.
+				 */
+				slot = ExecProcNode(plan);
+				if (TupIsNull(slot))
+				{
+					mrucache->state = MRUCACHE_ENDOFSCAN;
+					return NULL;
+				}
+
+				return slot;
+			}
+
+		case MRUCACHE_ENDOFSCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized mrucache state: %d",
+				 (int) mrucache->state);
+			return NULL;
+	}							/* switch */
+}
+
+/*
+ * ExecMRUTupleCacheInit
+ *		Builds and returns a MRUTupleCache struct to allow caching of tuples
+ *		from 'cache_planstate'.
+ *
+ * 'planstate' the parent plan node that we're performing caching for.
+ * 'cache_planstate' the sub-node of 'planstate' that we're to cache tuples
+ *		from.
+ * 'param_exprs' the cache key parameters
+ * 'hashOperators' the operators for the hash functions to use to hash the
+ *		cache key exprs.  Must have list_length(param_exprs) elements.
+ * 'collations' collations for cache key exprs.  Must have
+ *		list_length(param_exprs) elements.
+ * 'memory_limit_bytes' the number of bytes to limit the size of the cache to.
+ * 'est_entries' the estimated number of entries we expect to cache. Or 0 if
+ *		unknown.
+ * 'singlerow' if true, mark the cache entry as complete after fetching the
+ *		first tuple.  Some callers may wish to pass this as true if they only
+ *		need to fetch 1 tuple and would like the cache entry for that 1 tuple
+ *		to become valid after the first tuple is fetched.
+ */
+MRUTupleCache *
+ExecMRUTupleCacheInit(PlanState *planstate, PlanState *cache_planstate,
+					  List *param_exprs, Oid *hashOperators, Oid *collations,
+					  uint64 memory_limit_bytes, int est_entries,
+					  bool singlerow)
+{
+	MRUTupleCache *mrucache = (MRUTupleCache *) palloc0(sizeof(MRUTupleCache));
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	mrucache->subplan = cache_planstate;
+	mrucache->ps_ExprContext = CreateExprContext(planstate->state);
+	mrucache->state = MRUCACHE_LOOKUP;
+
+	mrucache->nkeys = nkeys = list_length(param_exprs);
+	mrucache->hashkeydesc = ExecTypeFromExprList(param_exprs);
+	mrucache->tableslot = MakeSingleTupleTableSlot(mrucache->hashkeydesc,
+												   &TTSOpsMinimalTuple);
+	mrucache->cachefoundslot = MakeSingleTupleTableSlot(cache_planstate->ps_ResultTupleDesc,
+		&TTSOpsMinimalTuple);
+	mrucache->probeslot = MakeSingleTupleTableSlot(mrucache->hashkeydesc,
+												   &TTSOpsVirtual);
+
+	mrucache->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	mrucache->collations = collations;
+	mrucache->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = (Oid *) palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &mrucache->hashfunctions[i]);
+
+		mrucache->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) planstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	mrucache->cache_eq_expr = ExecBuildParamSetEqual(mrucache->hashkeydesc,
+													 &TTSOpsMinimalTuple,
+													 eqfuncoids,
+													 collations,
+													 param_exprs,
+													 (PlanState *) planstate);
+
+	pfree(eqfuncoids);
+	mrucache->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	mrucache->mem_upperlimit = memory_limit_bytes;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	mrucache->mem_lowerlimit = mrucache->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	mrucache->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												   "MRUCacheHashTable",
+												   ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&mrucache->lru_list);
+	mrucache->last_tuple = NULL;
+	mrucache->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	mrucache->singlerow = singlerow;
+
+	/* Zero the statistics counters */
+	memset(&mrucache->stats, 0, sizeof(MRUCacheInstrumentation));
+
+	/*
+	 * Allocate and set up the actual cache.  We'll just use 1024 buckets if
+	 * the caller did not specify an estimate.
+	 */
+	build_hash_table(mrucache, est_entries > 0 ? est_entries :
+					 1024);
+
+	return mrucache;
+}
+
+void
+ExecMRUTupleCacheCleanup(MRUTupleCache *mrucache)
+{
+	/* Remove the cache context */
+	MemoryContextDelete(mrucache->tableContext);
+
+	ExecClearTuple(mrucache->cachefoundslot);
+	FreeExprContext(mrucache->ps_ExprContext, false);
+}
+
+/*
+ * ExecEstimateMRUCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateMRUCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(MRUCacheEntry) + sizeof(MRUCacheKey) +
+		sizeof(MRUCacheTuple) * ntuples;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..38973b1591 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeNestloop.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -276,6 +277,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 				ExecHashJoinEstimate((HashJoinState *) planstate,
 									 e->pcxt);
 			break;
+		case T_NestLoopState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecNestLoopEstimate((NestLoopState *) planstate, e->pcxt);
+			break;
 		case T_HashState:
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecHashEstimate((HashState *) planstate, e->pcxt);
@@ -496,6 +501,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 				ExecHashJoinInitializeDSM((HashJoinState *) planstate,
 										  d->pcxt);
 			break;
+		case T_NestLoopState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecNestLoopInitializeDSM((NestLoopState *) planstate, d->pcxt);
+			break;
 		case T_HashState:
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecHashInitializeDSM((HashState *) planstate, d->pcxt);
@@ -985,6 +994,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 				ExecHashJoinReInitializeDSM((HashJoinState *) planstate,
 											pcxt);
 			break;
+		case T_NestLoopState:
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
@@ -1045,6 +1055,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	/* Perform any node-type-specific work that needs to be done. */
 	switch (nodeTag(planstate))
 	{
+		case T_NestLoopState:
+			ExecNestLoopRetrieveInstrumentation((NestLoopState *) planstate);
+			break;
 		case T_SortState:
 			ExecSortRetrieveInstrumentation((SortState *) planstate);
 			break;
@@ -1332,6 +1345,10 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 				ExecHashJoinInitializeWorker((HashJoinState *) planstate,
 											 pwcxt);
 			break;
+		case T_NestLoopState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecNestLoopInitializeWorker((NestLoopState *) planstate, pwcxt);
+			break;
 		case T_HashState:
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecHashInitializeWorker((HashState *) planstate, pwcxt);
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c2996d4..fbefc127b2 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -21,11 +21,43 @@
 
 #include "postgres.h"
 
+#include "executor/execMRUTupleCache.h"
 #include "executor/execdebug.h"
+#include "executor/nodeHash.h"
 #include "executor/nodeNestloop.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
 
+static inline TupleTableSlot *
+FetchInnerTuple(NestLoopState *nlstate, PlanState *innerPlan)
+{
+	MRUTupleCache *mrucache = nlstate->nl_mrucache;
+
+	/* No caching? Just exec the inner node */
+	if (mrucache == NULL)
+		return ExecProcNode(innerPlan);
+
+	/* Otherwise let the cache deal with it */
+	else
+	{
+		TupleTableSlot *slot = ExecMRUTupleCacheFetch(mrucache);
+
+		if (slot == mrucache->cachefoundslot)
+		{
+			nlstate->js.ps.ps_ProjInfo = nlstate->ps_CacheProjInfo;
+			nlstate->js.ps.qual = nlstate->ps_CacheQual;
+			nlstate->js.joinqual = nlstate->ps_CacheJoinqual;
+		}
+		else
+		{
+			nlstate->js.ps.ps_ProjInfo = nlstate->ps_ScanProjInfo;
+			nlstate->js.ps.qual = nlstate->ps_ScanQual;
+			nlstate->js.joinqual = nlstate->ps_ScanJoinqual;
+		}
+		return slot;
+	}
+}
+
 
 /* ----------------------------------------------------------------
  *		ExecNestLoop(node)
@@ -66,8 +98,6 @@ ExecNestLoop(PlanState *pstate)
 	PlanState  *outerPlan;
 	TupleTableSlot *outerTupleSlot;
 	TupleTableSlot *innerTupleSlot;
-	ExprState  *joinqual;
-	ExprState  *otherqual;
 	ExprContext *econtext;
 	ListCell   *lc;
 
@@ -79,8 +109,6 @@ ExecNestLoop(PlanState *pstate)
 	ENL1_printf("getting info from node");
 
 	nl = (NestLoop *) node->js.ps.plan;
-	joinqual = node->js.joinqual;
-	otherqual = node->js.ps.qual;
 	outerPlan = outerPlanState(node);
 	innerPlan = innerPlanState(node);
 	econtext = node->js.ps.ps_ExprContext;
@@ -150,6 +178,14 @@ ExecNestLoop(PlanState *pstate)
 			 */
 			ENL1_printf("rescanning inner plan");
 			ExecReScan(innerPlan);
+
+			/*
+			 * When using an MRU cache, reset the state ready for another
+			 * lookup.
+			 */
+			if (node->nl_mrucache)
+				ExecMRUTupleCacheFinishScan(node->nl_mrucache);
+
 		}
 
 		/*
@@ -157,7 +193,7 @@ ExecNestLoop(PlanState *pstate)
 		 */
 		ENL1_printf("getting new inner tuple");
 
-		innerTupleSlot = ExecProcNode(innerPlan);
+		innerTupleSlot = FetchInnerTuple(node, innerPlan);
 		econtext->ecxt_innertuple = innerTupleSlot;
 
 		if (TupIsNull(innerTupleSlot))
@@ -180,7 +216,7 @@ ExecNestLoop(PlanState *pstate)
 
 				ENL1_printf("testing qualification for outer-join tuple");
 
-				if (otherqual == NULL || ExecQual(otherqual, econtext))
+				if (node->js.ps.qual == NULL || ExecQual(node->js.ps.qual, econtext))
 				{
 					/*
 					 * qualification was satisfied so we project and return
@@ -211,7 +247,7 @@ ExecNestLoop(PlanState *pstate)
 		 */
 		ENL1_printf("testing qualification");
 
-		if (ExecQual(joinqual, econtext))
+		if (ExecQual(node->js.joinqual, econtext))
 		{
 			node->nl_MatchedOuter = true;
 
@@ -230,7 +266,7 @@ ExecNestLoop(PlanState *pstate)
 			if (node->js.single_match)
 				node->nl_NeedNewOuter = true;
 
-			if (otherqual == NULL || ExecQual(otherqual, econtext))
+			if (node->js.ps.qual == NULL || ExecQual(node->js.ps.qual, econtext))
 			{
 				/*
 				 * qualification was satisfied so we project and return the
@@ -306,15 +342,18 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 */
 	ExecInitResultTupleSlotTL(&nlstate->js.ps, &TTSOpsVirtual);
 	ExecAssignProjectionInfo(&nlstate->js.ps, NULL);
+	nlstate->ps_ScanProjInfo = nlstate->js.ps.ps_ProjInfo;
 
 	/*
 	 * initialize child expressions
 	 */
 	nlstate->js.ps.qual =
 		ExecInitQual(node->join.plan.qual, (PlanState *) nlstate);
+	nlstate->ps_ScanQual = nlstate->js.ps.qual;
 	nlstate->js.jointype = node->join.jointype;
 	nlstate->js.joinqual =
 		ExecInitQual(node->join.joinqual, (PlanState *) nlstate);
+	nlstate->ps_ScanJoinqual = nlstate->js.joinqual;
 
 	/*
 	 * detect whether we need only consider the first matching inner tuple
@@ -346,12 +385,59 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	nlstate->nl_NeedNewOuter = true;
 	nlstate->nl_MatchedOuter = false;
 
+	/* Setup the MRU cache, if enabled */
+	if (node->mrucache)
+	{
+		nlstate->nl_mrucache = ExecMRUTupleCacheInit((PlanState *) nlstate,
+													 (PlanState *) innerPlanState(nlstate),
+													 node->param_exprs,
+													 node->hashOperators,
+													 node->collations,
+													 get_hash_mem() * 1024L,
+													 node->est_entries,
+													 node->singlerow);
+
+		/*
+		 * Create a seperate Projection info for projecting from the slots
+		 * belonging to the result cache.
+		 */
+		if (nlstate->js.ps.innerops != &TTSOpsMinimalTuple)
+		{
+			const TupleTableSlotOps *ttsops = nlstate->js.ps.innerops;
+			bool inneropsset = nlstate->js.ps.inneropsset;
+
+			nlstate->js.ps.innerops = &TTSOpsMinimalTuple;
+			nlstate->js.ps.inneropsset = true;
+
+			nlstate->ps_CacheProjInfo = ExecBuildProjectionInfo(nlstate->js.ps.plan->targetlist,
+																nlstate->js.ps.ps_ExprContext,
+																nlstate->js.ps.ps_ResultTupleSlot,
+																&nlstate->js.ps,
+																NULL);
+
+			nlstate->ps_CacheQual =
+				ExecInitQual(node->join.plan.qual, (PlanState *) nlstate);
+			nlstate->ps_CacheJoinqual =
+				ExecInitQual(node->join.joinqual, (PlanState *) nlstate);
+
+			/* Restore original values */
+			nlstate->js.ps.innerops = ttsops;
+			nlstate->js.ps.inneropsset = inneropsset;
+		}
+	}
+	else
+	{
+		nlstate->nl_mrucache = NULL;
+		nlstate->ps_CacheProjInfo = NULL;
+	}
+
 	NL1_printf("ExecInitNestLoop: %s\n",
 			   "node initialized");
 
 	return nlstate;
 }
 
+
 /* ----------------------------------------------------------------
  *		ExecEndNestLoop
  *
@@ -380,6 +466,29 @@ ExecEndNestLoop(NestLoopState *node)
 	ExecEndNode(outerPlanState(node));
 	ExecEndNode(innerPlanState(node));
 
+	if (node->nl_mrucache != NULL)
+	{
+		/*
+		 * When ending a parallel worker, copy the statistics gathered by the
+		 * worker back into shared memory so that it can be picked up by the main
+		 * process to report in EXPLAIN ANALYZE.
+		 */
+		if (node->shared_info && IsParallelWorker())
+		{
+			MRUCacheInstrumentation *si;
+
+			/* Make mem_peak available for EXPLAIN */
+			if (node->nl_mrucache->stats.mem_peak == 0)
+				node->nl_mrucache->stats.mem_peak = node->nl_mrucache->mem_used;
+
+			Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+			si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+			memcpy(si, &node->nl_mrucache->stats, sizeof(MRUCacheInstrumentation));
+		}
+
+		ExecMRUTupleCacheCleanup(node->nl_mrucache);
+	}
+
 	NL1_printf("ExecEndNestLoop: %s\n",
 			   "node processing ended");
 }
@@ -400,6 +509,8 @@ ExecReScanNestLoop(NestLoopState *node)
 	if (outerPlan->chgParam == NULL)
 		ExecReScan(outerPlan);
 
+	if (node->nl_mrucache != NULL)
+		ExecMRUTupleCacheFinishScan(node->nl_mrucache);
 	/*
 	 * innerPlan is re-scanned for each new outer tuple and MUST NOT be
 	 * re-scanned from here or you'll get troubles from inner index scans when
@@ -409,3 +520,89 @@ ExecReScanNestLoop(NestLoopState *node)
 	node->nl_NeedNewOuter = true;
 	node->nl_MatchedOuter = false;
 }
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecNestLoopEstimate
+  *
+  *		Estimate space required to propagate nested loop statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecNestLoopEstimate(NestLoopState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->js.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(MRUCacheInstrumentation));
+	size = add_size(size, offsetof(SharedMRUCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecNestLoopInitializeDSM
+ *
+ *		Initialize DSM space for nested loop statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecNestLoopInitializeDSM(NestLoopState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->js.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedMRUCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(MRUCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->js.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecNestLoopInitializeWorker
+ *
+ *		Attach worker to DSM space for nested loop statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecNestLoopInitializeWorker(NestLoopState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->js.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecNestLoopRetrieveInstrumentation
+ *
+*		Transfer nested loop statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecNestLoopRetrieveInstrumentation(NestLoopState *node)
+{
+	Size		size;
+	SharedMRUCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedMRUCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(MRUCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2b4d7654cc..fa28ad7b1b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -837,7 +837,6 @@ static NestLoop *
 _copyNestLoop(const NestLoop *from)
 {
 	NestLoop   *newnode = makeNode(NestLoop);
-
 	/*
 	 * copy node superclass fields
 	 */
@@ -847,6 +846,13 @@ _copyNestLoop(const NestLoop *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(nestParams);
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, from->numKeys * sizeof(Oid));
+	COPY_POINTER_FIELD(collations, from->numKeys * sizeof(Oid));
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(mrucache);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 08a049232e..e4bbc688b6 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -732,6 +732,13 @@ _outNestLoop(StringInfo str, const NestLoop *node)
 	_outJoinPlanInfo(str, (const Join *) node);
 
 	WRITE_NODE_FIELD(nestParams);
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(mrucache);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_INT_FIELD(est_entries);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ab7b535caa..b62ac16cb4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2088,6 +2088,13 @@ _readNestLoop(void)
 	ReadCommonJoin(&local_node->join);
 
 	READ_NODE_FIELD(nestParams);
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(mrucache);
+	READ_BOOL_FIELD(singlerow);
+	READ_INT_FIELD(est_entries);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a0877e2be4..6a29143575 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
 #include "access/amapi.h"
 #include "access/htup_details.h"
 #include "access/tsmapi.h"
+#include "executor/execMRUTupleCache.h"
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
@@ -138,6 +139,7 @@ bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
+bool		enable_cachednestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
@@ -2736,10 +2738,11 @@ initial_cost_nestloop(PlannerInfo *root, JoinCostWorkspace *workspace,
 void
 final_cost_nestloop(PlannerInfo *root, NestPath *path,
 					JoinCostWorkspace *workspace,
-					JoinPathExtraData *extra)
+					JoinPathExtraData *extra,
+					bool enabled)
 {
-	Path	   *outer_path = path->outerjoinpath;
-	Path	   *inner_path = path->innerjoinpath;
+	Path	   *outer_path = path->jpath.outerjoinpath;
+	Path	   *inner_path = path->jpath.innerjoinpath;
 	double		outer_path_rows = outer_path->rows;
 	double		inner_path_rows = inner_path->rows;
 	Cost		startup_cost = workspace->startup_cost;
@@ -2754,18 +2757,18 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 	if (inner_path_rows <= 0)
 		inner_path_rows = 1;
 	/* Mark the path with the correct row estimate */
-	if (path->path.param_info)
-		path->path.rows = path->path.param_info->ppi_rows;
+	if (path->jpath.path.param_info)
+		path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
 	else
-		path->path.rows = path->path.parent->rows;
+		path->jpath.path.rows = path->jpath.path.parent->rows;
 
 	/* For partial paths, scale row estimate. */
-	if (path->path.parallel_workers > 0)
+	if (path->jpath.path.parallel_workers > 0)
 	{
-		double		parallel_divisor = get_parallel_divisor(&path->path);
+		double		parallel_divisor = get_parallel_divisor(&path->jpath.path);
 
-		path->path.rows =
-			clamp_row_est(path->path.rows / parallel_divisor);
+		path->jpath.path.rows =
+			clamp_row_est(path->jpath.path.rows / parallel_divisor);
 	}
 
 	/*
@@ -2773,12 +2776,12 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 	 * would amount to optimizing for the case where the join method is
 	 * disabled, which doesn't seem like the way to bet.
 	 */
-	if (!enable_nestloop)
+	if (!enabled)
 		startup_cost += disable_cost;
 
 	/* cost of inner-relation source data (we already dealt with outer rel) */
 
-	if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI ||
+	if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI ||
 		extra->inner_unique)
 	{
 		/*
@@ -2896,17 +2899,240 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 	}
 
 	/* CPU costs */
-	cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
+	cost_qual_eval(&restrict_qual_cost, path->jpath.joinrestrictinfo, root);
 	startup_cost += restrict_qual_cost.startup;
 	cpu_per_tuple = cpu_tuple_cost + restrict_qual_cost.per_tuple;
 	run_cost += cpu_per_tuple * ntuples;
 
 	/* tlist eval costs are paid per output row, not per tuple scanned */
-	startup_cost += path->path.pathtarget->cost.startup;
-	run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+	startup_cost += path->jpath.path.pathtarget->cost.startup;
+	run_cost += path->jpath.path.pathtarget->cost.per_tuple * path->jpath.path.rows;
 
-	path->path.startup_cost = startup_cost;
-	path->path.total_cost = startup_cost + run_cost;
+	path->jpath.path.startup_cost = startup_cost;
+	path->jpath.path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * initial_cost_cached_nestloop
+ *	  Preliminary estimate of the cost of a cached nestloop join path.
+ *
+ * This must quickly produce lower-bound estimates of the path's startup and
+ * total costs.  If we are unable to eliminate the proposed path from
+ * consideration using the lower bounds, final_cost_cached_nestloop will be
+ * called to obtain the final estimates.
+ *
+ * The exact division of labor between this function and
+ * final_cost_cached_nestloop is private to them, and represents a tradeoff
+ * between speed of the initial estimate and getting a tight lower bound.  We
+ * choose to not examine the join quals here, since that's by far the most
+ * expensive part of the calculations.  The end result is that CPU-cost
+ * considerations must be left for the second phase; and for SEMI/ANTI joins,
+ * we must also postpone incorporation of the inner path's run cost.
+ *
+ * 'workspace' is to be filled with startup_cost, total_cost, and perhaps
+ *		other data to be used by final_cost_nestloop
+ * 'jointype' is the type of join to be performed
+ * 'outer_path' is the outer input to the join
+ * 'inner_path' is the inner input to the join
+ * 'extra' contains miscellaneous information about the join
+ * 'param_exprs' contains the list of exprs that the inner_path is
+ *		parameterized by.
+ *
+ * Returns the estimated number of entries which can be stored in the cache at
+ * a time.
+ */
+int
+initial_cost_cached_nestloop(PlannerInfo *root, JoinCostWorkspace *workspace,
+							 JoinType jointype,
+							 Path *outer_path, Path *inner_path,
+							 JoinPathExtraData *extra, List *param_exprs)
+{
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	double		outer_path_rows = outer_path->rows;
+	double		inner_path_rows = inner_path->rows;
+	Cost		inner_rescan_start_cost;
+	Cost		inner_rescan_total_cost;
+	Cost		input_total_cost = inner_path->total_cost;
+	Cost		input_startup_cost = inner_path->startup_cost;
+	Cost		inner_run_cost;
+	Cost		inner_rescan_run_cost;
+	int			width = inner_path->pathtarget->width;
+	int			flags;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	int			est_entries;
+
+	/* Protect some assumptions below that rowcounts aren't zero */
+	if (outer_path_rows <= 0)
+		outer_path_rows = 1;
+	if (inner_path_rows <= 0)
+		inner_path_rows = 1;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(inner_path_rows, width) +
+		ExecEstimateMRUCacheEntryOverheadBytes(inner_path_rows);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, param_exprs, outer_path_rows, NULL,
+									&flags);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a cached nested loop.  The use of a
+	 * default could cause us to choose this plan type when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean the cached nested loop will never survive add_path().
+	 */
+	if ((flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = outer_path_rows;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to determine the est_entries.
+	 * This will ultimately determine the initial size of the hash table that
+	 * the executor will use.  If we leave this at zero the executor will just
+	 * choose the size itself.  Really this is not the right place to do this,
+	 * but it's convenient since we already have the ndistinct estimate and an
+	 * estimate on the number of entries that will fit based on
+	 * hash_mem_bytes.
+	 */
+	 est_entries = Min(Min(ndistinct, est_cache_entries),
+						   PG_UINT32_MAX);
+
+	 /*
+	  * When the number of distinct parameter values is above the amount we can
+	  * store in the cache, then we'll have to evict some entries from the
+	  * cache.  This is not free, so here we estimate how often we'll incur the
+	  * cost of that eviction.
+	  */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total of this node and how
+	 * many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / outer_path_rows);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0);
+
+	/*
+	 * Set the inner_rescan_total_cost accounting for the expected cache hit
+	 * ratio.  We also add on a cpu_operator_cost to account for a cache
+	 * lookup. This will happen regardless of if it's a cache hit or not.
+	 */
+	inner_rescan_total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	inner_rescan_total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	inner_rescan_total_cost += cpu_operator_cost / 10.0 * evict_ratio * inner_path_rows;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	inner_rescan_start_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * NOTE: clearly, we must pay both outer and inner paths' startup_cost
+	 * before we can start returning tuples, so the join's startup cost is
+	 * their sum.  We'll also pay the inner path's rescan startup cost
+	 * multiple times.
+	 */
+	startup_cost += outer_path->startup_cost + inner_path->startup_cost;
+	run_cost += outer_path->total_cost - outer_path->startup_cost;
+	if (outer_path_rows > 1)
+		run_cost += (outer_path_rows - 1) * inner_rescan_start_cost;
+
+	inner_run_cost = inner_path->total_cost - inner_path->startup_cost;
+	inner_rescan_run_cost = inner_rescan_total_cost - inner_rescan_start_cost;
+
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI ||
+		extra->inner_unique)
+	{
+		/*
+		 * With a SEMI or ANTI join, or if the innerrel is known unique, the
+		 * executor will stop after the first match.
+		 *
+		 * Getting decent estimates requires inspection of the join quals,
+		 * which we choose to postpone to final_cost_cached_nestloop.
+		 */
+
+		 /* Save private data for final_cost_cached_nestloop */
+		workspace->inner_run_cost = inner_run_cost;
+		workspace->inner_rescan_run_cost = inner_rescan_run_cost;
+	}
+	else
+	{
+		/* Normal case; we'll scan whole input rel for each outer row */
+		run_cost += inner_run_cost;
+		if (outer_path_rows > 1)
+			run_cost += (outer_path_rows - 1) * inner_rescan_run_cost;
+	}
+
+	/* CPU costs left for later */
+
+	/* Public result fields */
+	workspace->startup_cost = startup_cost;
+	workspace->total_cost = startup_cost + run_cost;
+	/* Save private data for final_cost_cached_nestloop */
+	workspace->run_cost = run_cost;
+
+	return est_entries;
+}
+
+/*
+ * final_cost_cached_nestloop
+ *	  Final estimate of the cost and result size of a cached nestloop join
+ *	  path.
+ *
+ * 'path' is already filled in except for the rows and cost fields
+ * 'workspace' is the result from initial_cost_nestloop
+ * 'extra' contains miscellaneous information about the join
+ */
+void
+final_cost_cached_nestloop(PlannerInfo *root, NestPath *path,
+						   JoinCostWorkspace *workspace,
+						   JoinPathExtraData *extra)
+{
+	/*
+	 * The final costings are identical to final_cost_nestloop.  We pass true
+	 * for the 'enabled' as we wouldn't have got here if enable_cachednestloop
+	 * was false.
+	 */
+	final_cost_nestloop(root, path, workspace, extra, true);
 }
 
 /*
@@ -4502,14 +4728,14 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 static bool
 has_indexed_join_quals(NestPath *joinpath)
 {
-	Relids		joinrelids = joinpath->path.parent->relids;
-	Path	   *innerpath = joinpath->innerjoinpath;
+	Relids		joinrelids = joinpath->jpath.path.parent->relids;
+	Path	   *innerpath = joinpath->jpath.innerjoinpath;
 	List	   *indexclauses;
 	bool		found_one;
 	ListCell   *lc;
 
 	/* If join still has quals to evaluate, it's not fast */
-	if (joinpath->joinrestrictinfo != NIL)
+	if (joinpath->jpath.joinrestrictinfo != NIL)
 		return false;
 	/* Nor if the inner path isn't parameterized at all */
 	if (innerpath->param_info == NULL)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..62572ab050 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,15 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +57,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +171,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow cached nested
+			 * semi/anti join loops to be considered
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +367,188 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a cached nested loop join.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 *
+	 * XXX Think about this harder. Any other restrictions to add here?
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos((Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * can_cached_nestloop
+ *		Returns true if it's possible to build a hash table inside a
+ *		parameterized nested loop to cache most recently seen parameters.
+ *
+ * Sets param_exprs to the cache key parameters and hash_operators to the
+ * hash operators for the cache upon returning true.
+ */
+static bool
+can_cached_nestloop(PlannerInfo *root, RelOptInfo *innerrel,
+					RelOptInfo *outerrel, Path *inner_path,
+					Path *outer_path, JoinType jointype,
+					JoinPathExtraData *extra, List **param_exprs,
+					List **hash_operators)
+{
+	/* Obviously not if it's disabled */
+	if (!enable_cachednestloop)
+		return false;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return false;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return false;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return false;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									param_exprs,
+									hash_operators,
+									outerrel,
+									innerrel))
+		return true;
+
+	return false;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -548,6 +743,214 @@ try_partial_nestloop_path(PlannerInfo *root,
 										  NULL));
 }
 
+/*
+ * try_cached_nestloop_path
+ *	  Consider a cached nestloop join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_cached_nestloop_path(PlannerInfo *root,
+						 RelOptInfo *joinrel,
+						 Path *outer_path,
+						 Path *inner_path,
+						 List *pathkeys,
+						 JoinType jointype,
+						 JoinPathExtraData *extra,
+						 List *param_exprs,
+						 List *hash_operators)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+	RelOptInfo *innerrel = inner_path->parent;
+	RelOptInfo *outerrel = outer_path->parent;
+	Relids		innerrelids;
+	Relids		outerrelids;
+	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
+	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+	int			table_size;
+
+	/*
+	 * Paths are parameterized by top-level parents, so run parameterization
+	 * tests on the parent relids.
+	 */
+	if (innerrel->top_parent_relids)
+		innerrelids = innerrel->top_parent_relids;
+	else
+		innerrelids = innerrel->relids;
+
+	if (outerrel->top_parent_relids)
+		outerrelids = outerrel->top_parent_relids;
+	else
+		outerrelids = outerrel->relids;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible --- unless allow_star_schema_join
+	 * says to allow it anyway.  Also, we must reject if have_dangerous_phv
+	 * doesn't like the look of it, which could only happen if the nestloop is
+	 * still parameterized.
+	 */
+	required_outer = calc_nestloop_required_outer(outerrelids, outer_paramrels,
+												  innerrelids, inner_paramrels);
+	if (required_outer &&
+		((!bms_overlap(required_outer, extra->param_source_rels) &&
+		  !allow_star_schema_join(root, outerrelids, inner_paramrels)) ||
+		 have_dangerous_phv(root, outerrelids, inner_paramrels)))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Do a precheck to quickly eliminate obviously-inferior paths.  We
+	 * calculate a cheap lower bound on the path's cost and then use
+	 * add_path_precheck() to see if the path is clearly going to be dominated
+	 * by some existing path for the joinrel.  If not, do the full pushup with
+	 * creating a fully valid path structure and submitting it to add_path().
+	 * The latter two steps are expensive enough to make this two-phase
+	 * methodology worthwhile.
+	 */
+	table_size = initial_cost_cached_nestloop(root, &workspace, jointype,
+											  outer_path, inner_path, extra,
+											  param_exprs);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/*
+		 * If the inner path is parameterized, it is parameterized by the
+		 * topmost parent of the outer rel, not the outer rel itself.  Fix
+		 * that.
+		 */
+		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+		{
+			inner_path = reparameterize_path_by_child(root, inner_path,
+													  outer_path->parent);
+
+			/*
+			 * If we could not translate the path, we can't create nest loop
+			 * path.
+			 */
+			if (!inner_path)
+			{
+				bms_free(required_outer);
+				return;
+			}
+		}
+
+		add_path(joinrel, (Path *)
+				 create_cached_nestloop_path(root,
+											 joinrel,
+											 jointype,
+											 &workspace,
+											 extra,
+											 outer_path,
+											 inner_path,
+											 extra->restrictlist,
+											 pathkeys,
+											 required_outer,
+											 table_size,
+											 param_exprs,
+											 hash_operators));
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * try_partial_cached_nestloop_path
+ *	  Consider a partial cached nestloop join path; if it appears useful, push
+ *	  it into the joinrel's partial_pathlist via add_partial_path().
+ */
+static void
+try_partial_cached_nestloop_path(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 Path *outer_path,
+								 Path *inner_path,
+								 List *pathkeys,
+								 JoinType jointype,
+								 JoinPathExtraData *extra,
+								 List *param_exprs,
+								 List *hash_operators)
+{
+	JoinCostWorkspace workspace;
+	int			table_size;
+
+	/*
+	 * If the inner path is parameterized, the parameterization must be fully
+	 * satisfied by the proposed outer path.  Parameterized partial paths are
+	 * not supported.  The caller should already have verified that no
+	 * extra_lateral_rels are required here.
+	 */
+	Assert(bms_is_empty(joinrel->lateral_relids));
+	if (inner_path->param_info != NULL)
+	{
+		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
+		RelOptInfo *outerrel = outer_path->parent;
+		Relids		outerrelids;
+
+		/*
+		 * The inner and outer paths are parameterized, if at all, by the top
+		 * level parents, not the child relations, so we must use those relids
+		 * for our parameterization tests.
+		 */
+		if (outerrel->top_parent_relids)
+			outerrelids = outerrel->top_parent_relids;
+		else
+			outerrelids = outerrel->relids;
+
+		if (!bms_is_subset(inner_paramrels, outerrelids))
+			return;
+	}
+
+	/*
+	 * Before creating a path, get a quick lower bound on what it is likely to
+	 * cost.  Bail out right away if it looks terrible.
+	 */
+	table_size = initial_cost_cached_nestloop(root, &workspace, jointype,
+											  outer_path, inner_path, extra,
+											  param_exprs);
+	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+		return;
+
+	/*
+	 * If the inner path is parameterized, it is parameterized by the topmost
+	 * parent of the outer rel, not the outer rel itself.  Fix that.
+	 */
+	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
+	{
+		inner_path = reparameterize_path_by_child(root, inner_path,
+												  outer_path->parent);
+
+		/*
+		 * If we could not translate the path, we can't create nest loop path.
+		 */
+		if (!inner_path)
+			return;
+	}
+
+	/* Might be good enough to be worth trying, so let's try it. */
+	add_partial_path(joinrel, (Path *)
+					 create_cached_nestloop_path(root,
+												 joinrel,
+												 jointype,
+												 &workspace,
+												 extra,
+												 outer_path,
+												 inner_path,
+												 extra->restrictlist,
+												 pathkeys,
+												 NULL,
+												 table_size,
+												 param_exprs,
+												 hash_operators));
+}
+
 /*
  * try_mergejoin_path
  *	  Consider a merge join path; if it appears useful, push it into
@@ -1471,6 +1874,8 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				List	   *param_exprs;
+				List	   *hashoperators;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1884,21 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				if (can_cached_nestloop(root, innerrel, outerrel,
+										innerpath, outerpath, jointype,
+										extra, &param_exprs, &hashoperators))
+				{
+					try_cached_nestloop_path(root,
+											 joinrel,
+											 outerpath,
+											 innerpath,
+											 merge_pathkeys,
+											 jointype,
+											 extra,
+											 param_exprs,
+											 hashoperators);
+				}
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +2053,8 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			List	   *param_exprs;
+			List	   *hashoperators;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +2079,21 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			if (can_cached_nestloop(root, innerrel, outerrel,
+									innerpath, outerpath, jointype,
+									extra, &param_exprs, &hashoperators))
+			{
+				try_partial_cached_nestloop_path(root,
+												 joinrel,
+												 outerpath,
+												 innerpath,
+												 pathkeys,
+												 jointype,
+												 extra,
+												 param_exprs,
+												 hashoperators);
+			}
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 94280a730c..fc8daaded4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4077,22 +4077,26 @@ create_nestloop_plan(PlannerInfo *root,
 	NestLoop   *join_plan;
 	Plan	   *outer_plan;
 	Plan	   *inner_plan;
-	List	   *tlist = build_path_tlist(root, &best_path->path);
-	List	   *joinrestrictclauses = best_path->joinrestrictinfo;
+	List	   *tlist = build_path_tlist(root, &best_path->jpath.path);
+	List	   *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
 	List	   *joinclauses;
 	List	   *otherclauses;
 	Relids		outerrelids;
 	List	   *nestParams;
 	Relids		saveOuterRels = root->curOuterRels;
+	List	   *param_exprs = NIL;
 
 	/* NestLoop can project, so no need to be picky about child tlists */
-	outer_plan = create_plan_recurse(root, best_path->outerjoinpath, 0);
+	outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath, 0);
 
 	/* For a nestloop, include outer relids in curOuterRels for inner side */
 	root->curOuterRels = bms_union(root->curOuterRels,
-								   best_path->outerjoinpath->parent->relids);
+								   best_path->jpath.outerjoinpath->parent->relids);
+
+	inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath, 0);
 
-	inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0);
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
 
 	/* Restore curOuterRels */
 	bms_free(root->curOuterRels);
@@ -4103,10 +4107,10 @@ create_nestloop_plan(PlannerInfo *root,
 
 	/* Get the join qual clauses (in plain expression form) */
 	/* Any pseudoconstant clauses are ignored here */
-	if (IS_OUTER_JOIN(best_path->jointype))
+	if (IS_OUTER_JOIN(best_path->jpath.jointype))
 	{
 		extract_actual_join_clauses(joinrestrictclauses,
-									best_path->path.parent->relids,
+									best_path->jpath.path.parent->relids,
 									&joinclauses, &otherclauses);
 	}
 	else
@@ -4117,7 +4121,7 @@ create_nestloop_plan(PlannerInfo *root,
 	}
 
 	/* Replace any outer-relation variables with nestloop params */
-	if (best_path->path.param_info)
+	if (best_path->jpath.path.param_info)
 	{
 		joinclauses = (List *)
 			replace_nestloop_params(root, (Node *) joinclauses);
@@ -4129,7 +4133,7 @@ create_nestloop_plan(PlannerInfo *root,
 	 * Identify any nestloop parameters that should be supplied by this join
 	 * node, and remove them from root->curOuterParams.
 	 */
-	outerrelids = best_path->outerjoinpath->parent->relids;
+	outerrelids = best_path->jpath.outerjoinpath->parent->relids;
 	nestParams = identify_current_nestloop_params(root, outerrelids);
 
 	join_plan = make_nestloop(tlist,
@@ -4138,10 +4142,42 @@ create_nestloop_plan(PlannerInfo *root,
 							  nestParams,
 							  outer_plan,
 							  inner_plan,
-							  best_path->jointype,
-							  best_path->inner_unique);
+							  best_path->jpath.jointype,
+							  best_path->jpath.inner_unique);
+
+	if (best_path->use_cache)
+	{
+		Oid		   *operators;
+		Oid		   *collations;
+		ListCell   *lc;
+		ListCell   *lc2;
+		int			nkeys;
+		int			i;
+
+		join_plan->numKeys = nkeys = list_length(param_exprs);
+		Assert(nkeys > 0);
+		operators = palloc(nkeys * sizeof(Oid));
+		collations = palloc(nkeys * sizeof(Oid));
+
+		i = 0;
+		forboth(lc, param_exprs, lc2, best_path->hash_operators)
+		{
+			Expr	   *param_expr = (Expr *) lfirst(lc);
+			Oid			opno = lfirst_oid(lc2);
+
+			operators[i] = opno;
+			collations[i] = exprCollation((Node *) param_expr);
+			i++;
+		}
+		join_plan->mrucache = true;
+		join_plan->param_exprs = param_exprs;
+		join_plan->hashOperators = operators;
+		join_plan->collations = collations;
+		join_plan->singlerow = best_path->singlerow;
+		join_plan->est_entries = best_path->est_entries;
+	}
 
-	copy_generic_path_info(&join_plan->join.plan, &best_path->path);
+	copy_generic_path_info(&join_plan->join.plan, &best_path->jpath.path);
 
 	return join_plan;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 138a353f93..93a42a0521 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2371,10 +2371,10 @@ create_nestloop_path(PlannerInfo *root,
 		restrict_clauses = jclauses;
 	}
 
-	pathnode->path.pathtype = T_NestLoop;
-	pathnode->path.parent = joinrel;
-	pathnode->path.pathtarget = joinrel->reltarget;
-	pathnode->path.param_info =
+	pathnode->jpath.path.pathtype = T_NestLoop;
+	pathnode->jpath.path.parent = joinrel;
+	pathnode->jpath.path.pathtarget = joinrel->reltarget;
+	pathnode->jpath.path.param_info =
 		get_joinrel_parampathinfo(root,
 								  joinrel,
 								  outer_path,
@@ -2382,19 +2382,129 @@ create_nestloop_path(PlannerInfo *root,
 								  extra->sjinfo,
 								  required_outer,
 								  &restrict_clauses);
-	pathnode->path.parallel_aware = false;
-	pathnode->path.parallel_safe = joinrel->consider_parallel &&
+	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
-	pathnode->path.parallel_workers = outer_path->parallel_workers;
-	pathnode->path.pathkeys = pathkeys;
-	pathnode->jointype = jointype;
-	pathnode->inner_unique = extra->inner_unique;
-	pathnode->outerjoinpath = outer_path;
-	pathnode->innerjoinpath = inner_path;
-	pathnode->joinrestrictinfo = restrict_clauses;
+	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
+	pathnode->jpath.path.pathkeys = pathkeys;
+	pathnode->jpath.jointype = jointype;
+	pathnode->jpath.inner_unique = extra->inner_unique;
+	pathnode->jpath.outerjoinpath = outer_path;
+	pathnode->jpath.innerjoinpath = inner_path;
+	pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+	/* Zero out the fields specific to Cached Nested Loop */
+	pathnode->use_cache = false;
+	pathnode->singlerow = false;
+	pathnode->est_entries = 0;
+	pathnode->hash_operators = NIL;
+	pathnode->param_exprs = NIL;
+
+	final_cost_nestloop(root, pathnode, workspace, extra, enable_nestloop);
+
+	return pathnode;
+}
+
+/*
+ * create_cached_nestloop_path
+ *	  Creates a pathnode corresponding to a cached nestloop join between two
+ *	  relations.
+ *
+ * 'joinrel' is the join relation.
+ * 'jointype' is the type of join required
+ * 'workspace' is the result from initial_cost_nestloop
+ * 'extra' contains various information about the join
+ * 'outer_path' is the outer path
+ * 'inner_path' is the inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'pathkeys' are the path keys of the new join path
+ * 'required_outer' is the set of required outer rels
+ * 'table_size' number of initial buckets to make in cache hash table or 0 if
+ *		the executor should just decide.
+ * 'param_exprs' Exprs from the outer side of the join to use as cache keys
+ * 'hash_operators' hash operator Oid for each 'param_expr'
+ *
+ * Returns the resulting path node.
+ */
+NestPath *
+create_cached_nestloop_path(PlannerInfo *root,
+							RelOptInfo *joinrel,
+							JoinType jointype,
+							JoinCostWorkspace *workspace,
+							JoinPathExtraData *extra,
+							Path *outer_path,
+							Path *inner_path,
+							List *restrict_clauses,
+							List *pathkeys,
+							Relids required_outer,
+							int table_size,
+							List *param_exprs,
+							List *hash_operators)
+{
+	NestPath   *pathnode = makeNode(NestPath);
+	Relids		inner_req_outer = PATH_REQ_OUTER(inner_path);
+
+	/*
+	 * If the inner path is parameterized by the outer, we must drop any
+	 * restrict_clauses that are due to be moved into the inner path.  We have
+	 * to do this now, rather than postpone the work till createplan time,
+	 * because the restrict_clauses list can affect the size and cost
+	 * estimates for this path.
+	 */
+	if (bms_overlap(inner_req_outer, outer_path->parent->relids))
+	{
+		Relids		inner_and_outer = bms_union(inner_path->parent->relids,
+												inner_req_outer);
+		List	   *jclauses = NIL;
+		ListCell   *lc;
+
+		foreach(lc, restrict_clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+			if (!join_clause_is_movable_into(rinfo,
+											 inner_path->parent->relids,
+											 inner_and_outer))
+				jclauses = lappend(jclauses, rinfo);
+		}
+		restrict_clauses = jclauses;
+	}
+
+	pathnode->jpath.path.pathtype = T_NestLoop;
+	pathnode->jpath.path.parent = joinrel;
+	pathnode->jpath.path.pathtarget = joinrel->reltarget;
+	pathnode->jpath.path.param_info =
+		get_joinrel_parampathinfo(root,
+								  joinrel,
+								  outer_path,
+								  inner_path,
+								  extra->sjinfo,
+								  required_outer,
+								  &restrict_clauses);
+	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
+		outer_path->parallel_safe && inner_path->parallel_safe;
+	/* This is a foolish way to estimate parallel_workers, but for now... */
+	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
+	pathnode->jpath.path.pathkeys = pathkeys;
+	pathnode->jpath.jointype = jointype;
+	pathnode->jpath.inner_unique = extra->inner_unique;
+	pathnode->jpath.outerjoinpath = outer_path;
+	pathnode->jpath.innerjoinpath = inner_path;
+	pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+	pathnode->use_cache = true;
+	pathnode->singlerow = extra->inner_unique;
+	pathnode->est_entries = table_size;
+	pathnode->param_exprs = param_exprs;
+	pathnode->hash_operators = hash_operators;
+
+	/* initial_cost_cached_nestloop() already did the final costs */
+	pathnode->jpath.path.startup_cost = workspace->startup_cost;
+	pathnode->jpath.path.total_cost = workspace->total_cost;
 
-	final_cost_nestloop(root, pathnode, workspace, extra);
+	final_cost_cached_nestloop(root, pathnode, workspace, extra);
 
 	return pathnode;
 }
@@ -4018,13 +4128,15 @@ do { \
 		case T_NestPath:
 			{
 				JoinPath   *jpath;
+				NestPath   *npath;
 
-				FLAT_COPY_PATH(jpath, path, NestPath);
+				FLAT_COPY_PATH(npath, path, NestPath);
 
+				jpath = (JoinPath *) npath;
 				REPARAMETERIZE_CHILD_PATH(jpath->outerjoinpath);
 				REPARAMETERIZE_CHILD_PATH(jpath->innerjoinpath);
 				ADJUST_CHILD_ATTRS(jpath->joinrestrictinfo);
-				new_path = (Path *) jpath;
+				new_path = (Path *) npath;
 			}
 			break;
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6c656586e8..7c8f412767 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -7391,12 +7391,11 @@ find_param_referent(Param *param, deparse_context *context,
 			ListCell   *lc2;
 
 			/*
-			 * NestLoops transmit params to their inner child only; also, once
-			 * we've crawled up out of a subplan, this couldn't possibly be
-			 * the right match.
+			 * NestLoops transmit params to either side of the join; also,
+			 * once we've crawled up out of a subplan, this couldn't possibly
+			 * be the right match.
 			 */
 			if (IsA(ancestor, NestLoop) &&
-				child_plan == innerPlan(ancestor) &&
 				in_same_plan_level)
 			{
 				NestLoop   *nl = (NestLoop *) ancestor;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..b813cfac5e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_cachednestloop", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of cached nested-loop join plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_cachednestloop,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..f1b738c971 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -362,6 +362,7 @@
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
+#enable_cachednestloop = on
 #enable_parallel_append = on
 #enable_seqscan = on
 #enable_sort = on
diff --git a/src/include/executor/execMRUTupleCache.h b/src/include/executor/execMRUTupleCache.h
new file mode 100644
index 0000000000..e6ceef9086
--- /dev/null
+++ b/src/include/executor/execMRUTupleCache.h
@@ -0,0 +1,97 @@
+/*-------------------------------------------------------------------------
+ *
+ * execMRUTupleCache.h
+ *	  Routines setting up and using a most-recently-used cache to store sets
+ *	  of tuples for a given cache key.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/include/executor/execMRUTupleCache.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXEC_MRUTUPLECACHE_H
+#define EXEC_MRUTUPLECACHE_H
+
+#include "nodes/execnodes.h"
+
+typedef struct MRUCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} MRUCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker MRU cache information
+ * ----------------
+ */
+typedef struct SharedMRUCacheInfo
+{
+	int			num_workers;
+	MRUCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedMRUCacheInfo;
+
+ /* ----------------
+  *	MRUTupleCache information
+  *
+  *		Main data structure for MRUTupleCache.
+  * ----------------
+  */
+typedef struct MRUTupleCache
+{
+	PlanState	*subplan;		/* subplan to read and cache tuples from */
+	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
+	int			state;		/* value of MRUTupleCache's state machine */
+	int			nkeys;			/* number of hash table keys */
+	struct mrucache_hash *hashtable; /* hash table cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for hash keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *cachefoundslot; /* Slot to return found cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* memory limit in bytes for the cache */
+	uint64		mem_lowerlimit; /* reduce memory usage to below this when we
+								 * free up space */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct MRUCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct MRUCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	MRUCacheInstrumentation stats;	/* execution statistics */
+} MRUTupleCache;
+
+extern void ExecMRUTupleCacheFinishScan(MRUTupleCache *mrucache);
+extern TupleTableSlot *ExecMRUTupleCacheFetch(MRUTupleCache *mrucache);
+extern MRUTupleCache *ExecMRUTupleCacheInit(PlanState *planstate,
+											PlanState *cache_planstate,
+											List *param_exprs,
+											Oid *hashOperators,
+											Oid *collations,
+											uint64 memory_limit_bytes,
+											int est_entries,
+											bool singlerow);
+extern void ExecMRUTupleCacheCleanup(MRUTupleCache *mrucache);
+extern double ExecEstimateMRUCacheEntryOverheadBytes(double ntuples);
+
+#endif							/* EXEC_MRUTUPLECACHE_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index b7978cd22e..993919dbe2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -262,6 +262,12 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc ldesc,
+										 const TupleTableSlotOps *lops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeNestloop.h b/src/include/executor/nodeNestloop.h
index 5a048a799f..1e4c729bd7 100644
--- a/src/include/executor/nodeNestloop.h
+++ b/src/include/executor/nodeNestloop.h
@@ -20,4 +20,9 @@ extern NestLoopState *ExecInitNestLoop(NestLoop *node, EState *estate, int eflag
 extern void ExecEndNestLoop(NestLoopState *node);
 extern void ExecReScanNestLoop(NestLoopState *node);
 
+extern void ExecNestLoopEstimate(NestLoopState *node, ParallelContext *pcxt);
+extern void ExecNestLoopInitializeDSM(NestLoopState *node, ParallelContext *pcxt);
+extern void ExecNestLoopInitializeWorker(NestLoopState *node, ParallelWorkerContext *pwcxt);
+extern void ExecNestLoopRetrieveInstrumentation(NestLoopState *node);
+
 #endif							/* NODENESTLOOP_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index dc1f1df07e..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -921,11 +921,11 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 	tb->members--;
 
 	/*
-	 * Backward shift following elements till either an empty element
-	 * or an element at its optimal position is encountered.
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
 	 *
-	 * While that sounds expensive, the average chain length is short,
-	 * and deletions would otherwise require tombstones.
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
 	 */
 	while (true)
 	{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d68d6..bfb9d979f8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1855,6 +1856,14 @@ typedef struct NestLoopState
 	bool		nl_NeedNewOuter;
 	bool		nl_MatchedOuter;
 	TupleTableSlot *nl_NullInnerTupleSlot;
+	struct MRUTupleCache *nl_mrucache;
+	ProjectionInfo *ps_CacheProjInfo;	/* info for doing tuple projection */
+	ExprState	   *ps_CacheQual;
+	ExprState	   *ps_CacheJoinqual;
+	ProjectionInfo *ps_ScanProjInfo;	/* info for doing tuple projection */
+	ExprState	   *ps_ScanQual;
+	ExprState	   *ps_ScanJoinqual;
+	struct SharedMRUCacheInfo *shared_info; /* statistics for parallel workers */
 } NestLoopState;
 
 /* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3dd16b9ad5..29da4f57fb 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1534,11 +1534,41 @@ typedef struct JoinPath
 } JoinPath;
 
 /*
- * A nested-loop path needs no special fields.
+ * A standard non-cached nested loop only requires the fields supplied by
+ * JoinPath.  Cached Nested Loops require the following additional fields:
+ *
+ * 'use_cache' to indicate if the parameterized inner results should be saved
+ * for a later executinon which uses the same parameter values.  When false
+ * this is just a normal nested loop.
+ *
+ * 'singlerow' instructs the caching code to mark a cache entry as complete
+ * after we find the first row.  This is useful for unique joins where we stop
+ * trying to read additional rows after getting the first match.  Without this
+ * we'd leave the cache entry as uncomplete and be unable to use it next
+ * lookup.
+ *
+ * 'est_entries' the planners best guess at how large to make the hash table
+ * for the cache. 0 can be specified if the value is unknown.
+ *
+ * 'hash_operators' list of Oids for hash operators for each 'param_exprs'.
+ *
+ * 'param_exprs' vars/exprs from the outer side of the join which we use for
+ * the cache's key.
  */
 
-typedef JoinPath NestPath;
+typedef struct NestPath
+{
+	JoinPath	jpath;
 
+	bool		use_cache;
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+} NestPath;
 /*
  * A mergejoin path has these fields.
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7e6b10f86b..f2f31a6db6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -701,6 +701,19 @@ typedef struct NestLoop
 {
 	Join		join;
 	List	   *nestParams;		/* list of NestLoopParam nodes */
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		mrucache;		/* True if parameterized nested loop is to
+								 * cache rows from repeat scans. */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
 } NestLoop;
 
 typedef struct NestLoopParam
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6141654e47..29ee866892 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_cachednestloop;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
@@ -137,7 +138,17 @@ extern void initial_cost_nestloop(PlannerInfo *root,
 								  JoinPathExtraData *extra);
 extern void final_cost_nestloop(PlannerInfo *root, NestPath *path,
 								JoinCostWorkspace *workspace,
-								JoinPathExtraData *extra);
+								JoinPathExtraData *extra,
+								bool enabled);
+extern int initial_cost_cached_nestloop(PlannerInfo *root,
+										JoinCostWorkspace *workspace,
+										JoinType jointype,
+										Path *outer_path, Path *inner_path,
+										JoinPathExtraData *extra,
+										List *param_exprs);
+extern void final_cost_cached_nestloop(PlannerInfo *root, NestPath *path,
+									   JoinCostWorkspace *workspace,
+									   JoinPathExtraData *extra);
 extern void initial_cost_mergejoin(PlannerInfo *root,
 								   JoinCostWorkspace *workspace,
 								   JoinType jointype,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..562096e6c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -146,6 +146,20 @@ extern NestPath *create_nestloop_path(PlannerInfo *root,
 									  List *pathkeys,
 									  Relids required_outer);
 
+extern NestPath *create_cached_nestloop_path(PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 JoinType jointype,
+											 JoinCostWorkspace *workspace,
+											 JoinPathExtraData *extra,
+											 Path *outer_path,
+											 Path *inner_path,
+											 List *restrict_clauses,
+											 List *pathkeys,
+											 Relids required_outer,
+											 int table_size,
+											 List *param_exprs,
+											 List *hash_operators);
+
 extern MergePath *create_mergejoin_path(PlannerInfo *root,
 										RelOptInfo *joinrel,
 										JoinType jointype,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 455e1343ee..57ca9fda8d 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
  * callers to provide further details about some assumptions which were made
  * during the estimation.
  */
-#define SELFLAG_USED_DEFAULT		(1 << 0) /* Estimation fell back on one of
-											  * the DEFAULTs as defined above.
-											  */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..0453b0ba91 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_cachednestloop to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_cachednestloop;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -4056,8 +4058,9 @@ select * from
 where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
- Nested Loop
+ Cached Nested Loop
    Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+   Cache Key: i8.q1
    Join Filter: (t1.f1 = t2.f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
@@ -4072,7 +4075,7 @@ where t1.f1 = ss.f1;
          Output: (i8.q1), t2.f1
          ->  Seq Scan on public.text_tbl t2
                Output: i8.q1, t2.f1
-(16 rows)
+(17 rows)
 
 select * from
   text_tbl t1
@@ -4095,11 +4098,13 @@ select * from
 where t1.f1 = ss2.f1;
                             QUERY PLAN                             
 -------------------------------------------------------------------
- Nested Loop
+ Cached Nested Loop
    Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
+   Cache Key: (i8.q1), t2.f1
    Join Filter: (t1.f1 = (t2.f1))
-   ->  Nested Loop
+   ->  Cached Nested Loop
          Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Cache Key: i8.q1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4117,7 +4122,7 @@ where t1.f1 = ss2.f1;
          Output: ((i8.q1)), (t2.f1)
          ->  Seq Scan on public.text_tbl t3
                Output: (i8.q1), t2.f1
-(22 rows)
+(24 rows)
 
 select * from
   text_tbl t1
@@ -4141,8 +4146,9 @@ select 1 from
 where tt1.f1 = ss1.c0;
                         QUERY PLAN                        
 ----------------------------------------------------------
- Nested Loop
+ Cached Nested Loop
    Output: 1
+   Cache Key: tt4.f1
    ->  Nested Loop Left Join
          Output: tt1.f1, tt4.f1
          ->  Nested Loop
@@ -4170,7 +4176,7 @@ where tt1.f1 = ss1.c0;
                Output: (tt4.f1)
                ->  Seq Scan on public.text_tbl tt5
                      Output: tt4.f1
-(29 rows)
+(30 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4811,20 +4817,22 @@ explain (costs off)
                    QUERY PLAN                   
 ------------------------------------------------
  Aggregate
-   ->  Nested Loop
+   ->  Cached Nested Loop
+         Cache Key: a.two
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+(5 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
                    QUERY PLAN                   
 ------------------------------------------------
  Aggregate
-   ->  Nested Loop
+   ->  Cached Nested Loop
+         Cache Key: a.two
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+(5 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
@@ -4832,10 +4840,11 @@ explain (costs off)
                    QUERY PLAN                   
 ------------------------------------------------
  Aggregate
-   ->  Nested Loop
+   ->  Cached Nested Loop
+         Cache Key: a.two
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+(5 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4890,13 +4899,13 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Cached Nested Loop
+         Cache Key: "*VALUES*".column1
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
-               ->  Index Only Scan using tenk1_unique2 on tenk1 b
+         ->  Index Only Scan using tenk1_unique2 on tenk1 b
+               Index Cond: (unique2 = "*VALUES*".column1)
 (8 rows)
 
 select count(*) from tenk1 a,
@@ -6286,3 +6295,91 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+-- Tests for Cached Nested Loops
+-- Ensure we get a cached nested loop plan
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.twenty
+where t2.unique1 < 1000;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Cached Nested Loop (actual rows=1000 loops=1)
+         Cache Key: t2.twenty
+         Hits: 980  Misses: 20  Evictions: 0  Overflows: 0  Memory Usage: 3kB
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+               Index Cond: (unique1 = t2.twenty)
+               Heap Fetches: 0
+(12 rows)
+
+-- and check we get the expected results.
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.twenty
+where t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- try reducing work to test the cache eviction code.
+set work_mem to 64;
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.thousand
+where t2.unique1 < 1000;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Cached Nested Loop (actual rows=1000 loops=1)
+         Cache Key: t2.thousand
+         Hits: 0  Misses: 1000  Evictions: 378  Overflows: 0  Memory Usage: 65kB
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=1000)
+               Index Cond: (unique1 = t2.thousand)
+               Heap Fetches: 0
+(12 rows)
+
+reset enable_mergejoin;
+reset enable_hashjoin;
+reset work_mem;
+-- Try with LATERAL joins
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t2.unique1) from tenk1 t1,
+lateral (select t2.unique1 from tenk1 t2 where t1.twenty = t2.unique1) t2
+where t1.unique1 < 1000;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Cached Nested Loop (actual rows=1000 loops=1)
+         Cache Key: t1.twenty
+         Hits: 980  Misses: 20  Evictions: 0  Overflows: 0  Memory Usage: 3kB
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+               Index Cond: (unique1 = t1.twenty)
+               Heap Fetches: 0
+(12 rows)
+
+-- and check we get the expected results.
+select count(*),avg(t2.unique1) from tenk1 t1,
+lateral (select t2.unique1 from tenk1 t2 where t1.twenty = t2.unique1) t2
+where t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 50d2a7e4b9..97b200e482 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2065,7 +2068,9 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Cached Nested Loop (actual rows=N loops=N)
+                     Cache Key: a.a
+                     Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
@@ -2087,7 +2092,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
                                  Index Cond: (a = a.a)
-(27 rows)
+(29 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
@@ -2099,7 +2104,9 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Cached Nested Loop (actual rows=N loops=N)
+                     Cache Key: (a.a + 0)
+                     Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
@@ -2121,7 +2128,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
                                  Index Cond: (a = (a.a + 0))
-(27 rows)
+(29 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
@@ -2132,7 +2139,9 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Cached Nested Loop (actual rows=N loops=N)
+                     Cache Key: a.a
+                     Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
                      ->  Append (actual rows=N loops=N)
@@ -2154,7 +2163,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
                                  Index Cond: (a = a.a)
-(27 rows)
+(29 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
                                         explain_parallel_append                                         
@@ -2164,7 +2173,9 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Cached Nested Loop (actual rows=N loops=N)
+                     Cache Key: a.a
+                     Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
@@ -2187,7 +2198,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
                                  Index Cond: (a = a.a)
-(28 rows)
+(30 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
@@ -2198,7 +2209,9 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Cached Nested Loop (actual rows=N loops=N)
+                     Cache Key: a.a
+                     Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
@@ -2221,7 +2234,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
                                  Index Cond: (a = a.a)
-(28 rows)
+(30 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 9d56cdacf3..cc4de0e8c3 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1094,7 +1094,8 @@ where o.ten = 1;
                     QUERY PLAN                     
 ---------------------------------------------------
  Aggregate
-   ->  Nested Loop
+   ->  Cached Nested Loop
+         Cache Key: o.four
          ->  Seq Scan on onek o
                Filter: (ten = 1)
          ->  CTE Scan on x
@@ -1103,7 +1104,7 @@ where o.ten = 1;
                        ->  Result
                        ->  WorkTable Scan on x x_1
                              Filter: (a < 10)
-(10 rows)
+(11 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..bf3eaaccf1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -89,6 +89,7 @@ select name, setting from pg_settings where name like 'enable%';
               name              | setting 
 --------------------------------+---------
  enable_bitmapscan              | on
+ enable_cachednestloop          | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -106,7 +107,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..90ccba69de 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_cachednestloop to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_cachednestloop;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
@@ -2171,3 +2173,39 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+-- Tests for Cached Nested Loops
+-- Ensure we get a cached nested loop plan
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.twenty
+where t2.unique1 < 1000;
+
+-- and check we get the expected results.
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.twenty
+where t2.unique1 < 1000;
+
+-- try reducing work to test the cache eviction code.
+set work_mem to 64;
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t1.unique1) from tenk1 t1
+inner join tenk1 t2 on t1.unique1 = t2.thousand
+where t2.unique1 < 1000;
+
+reset enable_mergejoin;
+reset enable_hashjoin;
+reset work_mem;
+
+-- Try with LATERAL joins
+explain (analyze, costs off, timing off, summary off)
+select count(*),avg(t2.unique1) from tenk1 t1,
+lateral (select t2.unique1 from tenk1 t2 where t1.twenty = t2.unique1) t2
+where t1.unique1 < 1000;
+
+-- and check we get the expected results.
+select count(*),avg(t2.unique1) from tenk1 t1,
+lateral (select t2.unique1 from tenk1 t2 where t1.twenty = t2.unique1) t2
+where t1.unique1 < 1000;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1e904a8c5b..7ee792506d 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
-- 
2.21.0.windows.1

hundred_rows_per_rescan.pngimage/png; name=hundred_rows_per_rescan.pngDownload

one_row_per_rescan.pngimage/png; name=one_row_per_rescan.pngDownload

#52

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#51)

2 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:

So far benchmarking shows there's still a regression from the v8
version of the patch. This is using count(*). An earlier test [1] did
show speedups when we needed to deform tuples returned by the nested
loop node. I've not yet repeated that test again. I was disappointed
to see v9 slower than v8 after having spent about 3 days rewriting the
patch

I did some further tests this time with some tuple deforming. Again,
it does seem that v9 is slower than v8.

Graphs attached

Looking at profiles, I don't really see any obvious reason as to why
this is. I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.

David

Attachments:

resultcache_basic_deforming.pngimage/png; name=resultcache_basic_deforming.pngDownload

resultcache_full_deforming.pngimage/png; name=resultcache_full_deforming.pngDownload

�PNG


IHDR�{�^}�sRGB���gAMA���a	pHYs%%IR$���IDATx^���\������������w�]w�g�����c����q��F66��d0��L���E	��s���Q��VK�C=o�����U���T�G��Y�Y�O�s�Tx�������(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P�5�!(P���5k�v�a6}���-r���n'�x�|��v���[ww��-�����-[�l�������c�9�=V=��J�SOO��_��x??0�Q�u�d�;�������M�:uh+�\,]��n�p��g����y���s�����jh�����������
�}}}v������o�������c��(
��.T�RAj�}��@
dj<�UH>���g���|���{��+��9s��?�������f��m�^z�=���C�������8�9�{���������@1P���������O.NS��4�������SWW��y����~��A��s����8�����t�]p�n��X�}���|�kx��(
�`�	?�>���+�������u�TrK=��3n��G=�Yw��G�%�\b?������16�@(P����w�u�M���*`h&�<0\��@
�gM-P����`����fL��9��VaZ�����@
r@��/���+�(.��Bw�����PL3�U�O�U�z�������s�TR1\�@�!P���J�������h��Gn������4R�4���^�ZI�'�[�v�9�g���ag�q�ks��7m]�����}T����q�t�MnF���_==�+�}���r�)v�a�
��e���e����h�k����v�y���N���r8����������l�2�6m�;�#�8�.��b�L������>����V��5���W^������i:�[n�����x��.zw��W��a$�o=W�]w]���W��#=g���.�w���?=z/�����MOO�=��#�X����z�G����[�����������z����k�:���9X�s�|�{%~�������W��������������Q��:V:�>�t�������
��6�cM	�o�>������K=^=o�|�~�)�l���������z������>[�z�����5X-T�>���\aW�0P>��X���"��O?��1���o�������B�
P*<�v���,*�������}W�R��x1=N��Fr^������x��Raq����z�T��������sW�o=�|���c����+Z�q��]�s����G�S5�B�j��Y�5
�W�����]�����\�Xt[���G9g=z>t���[���P�:UHS�u�����������T��~�u<�S��x��^x����V�_s�����Ak��S�R��~z����!R�'z-�%�����������_8��
�z>}������~�=�$���<���>��
��o��)#=���QQZ��/��<�9��X����X�T�o��~����H�W�M���������z�:^x=�x�U9?���@
V�T<�2��@�B��Mq�D��P��y��1�_�6�P�+.F��u�Yg��4�0Ut�,����W��j��{��]�]3�*+�,��)S\�(�Wu��*�i���������������5[��87�|����X��k�t���bq�k�Q�Y_��|�F���<���.�ECXfF���w�9��
*n����+��xv��C�������[��Fx�un*8V��
��=����c����^z�{nZ~B�W��
����g�M[�~U�R�=�{Q�c�]\���
�zif�X�{M�6�A��U:?}�����
��ci�f��O>���2!���H�Z���/��C�W���w�q����6�K��{�����������!�(P��jP7q!��*P�p�b��B������#�0��
yj�v*�h�dgg�[zC����+qQN�%0�]3�g���4�8�����������TH
E���J_(VU��y+�x���s�.x._��N=�����x���]��m���
S�]��f��B�Z��1�������1FZ��Z���8����&�Koo�k�Q�Y��UW]�n�ZjC����B��^���U��m�m�_3�����_�Y�����v���9��"�Om��S�OKt������L(<�yx�����Ax�Ch����z��-�~�����G��#�*@K:����$<?zt1?=>E�s�%j�c3�U����{O�i(x�]��7�������j�m������x��Z%��G�B��P��{C���$��O��s�?�����C�_^��$�n����Xz�}��z�����k���r+:�g^�u�-�"E4�<<^�D���u������\�ep�}����R@_�T����=�����_9�����5���Q�V��B`uQD��Ua4#P����
�Z�V�.C;�]T������f#�H���
��q��W�K�m��������Z�X����s�s�����p�����W:o������h�y�������v*��E��@�"�
|�1�*~��s?������=��s����:�.\���t�R����U\U����}�0�6*���+*>k_�E�_|�p�����R��5�{J������s��~�Zz!.8����	�����:o=nE�����T���;w�\�x�c���9�'��GF+P�`�{�w��WN?���_:���p��-Su�^������<
E����q~>}\���rP�S:�U ��A�������j��f��H���i	}f����[F;?���@
�fu�5���p"��fDj��,#Qq%�T�V�G4+4��]�7c�-��Z9���Q�(PQ[E7����c���_?�O=FmS�6,��Q�B��6=���Gjf�<�{���[*��Z��������N>�s���Y����M_��	W��hV����](��Y��6�}i-o��B��X\����\���Q�U��X�K��(5sv,�kx/k_=�q�8�s�c3{U�F*P�1�/yRK���O�U���_
T�{8v5��n�}�_P�@�������F�}���>�}A�m
=v}��=>�V�5�8P�u��
�#]x0.�h��h�����ZjBT�����/��*�iY�T���*|]y��n_�T�\�X�e#������bh���n*8���mz\��G����z<�e�������T�t,��~�����y������n�x==�Z�Z�S=G�T������X����GE?����p-���P�T���X��N�k
�zNW�@��c���B�Y=Wz~��W�R��S����^��Z�#���A0R�:..kv���ga��[XW9~���_���u�Q��P.�F������eU��4��<�����>�4}�//(P��jP7��@=�E�����/RT������?*���0����C�X3>53X��:�����a
f�X��s<+���|aI�P��}�e����}�]E��I\��(�Y��W/�|\��/�y��&�B��~*��p��(���?l1^�h		}1�eYtz�T�S�<��b���
~�]�{u,�����������[3��3�,��������Eu���\p����������E�h� W[}!!��p�#}i5���>����e�p<�Y����Gx��_����_`��}j��Y����B5
���@����YK�Z�C���0���~*@���W(>���k��f#jv��8�_�I~B�tq�U���3�Lt��
���^���E�w���8gQ�Z���(z��:T��������+���������z�tl�_T���`]�O�������Q��{S�Y�!=c���u��E�����P�sU#~�Gz���U���q����0����������U�������`*�kv}(���b������5�8P�u�q��R��E�B���XM��x-���������6-?�B������oje��;���]�'������U1�"���Y�j��6����b�zLc�x��z�?�s����k�c�8��D��IU�
���S�T����m��t�k����8��

K��b��M�}(����5_��@��s���\U���[3f���������k�6���%>�_��B�=�Gzh������SNg��k,q�k��U�q~>}\F+?��C�6}��r��I�^��z�B���K���C}&�/w���9q�Q�
��n>��X
�ZK7��/3��eB�L3����5o�d�
v*��0��K/
���u�5cZm��.J�*�L(B�@����m��b�f�������3�5�q����������G
��o�>�������
ta}�#�8��'F�����sT���;�p��s��TL����Y�j�hd�:����_n�=���(��~���6�W�d�HFz�#]�/F�*��7�������z�m+Z���W^q�7�}ek��*��.�����<_�=�/�
���@��M��*��"�
9�r1a���qj�Z/ZE,��\�.v��K\D�k��s�u��Xu{U�"�B�T�N�.�����?��w��f��T�Sa2��f�j����j������i6��k��K��h�e���#z���#�����6��6z�U���T����/8���qE��*���,fST������WE�0�Z�X�h�������������kh���D�O����9��@�R���H��u!�|W�����%]�/��_|n����2ZXk����
��]�RKs��5=g�����fR��!����J������N�|�(P���8@c)P��-P�M�T`����]A+�U�NE-��FE5�������4�P����4�0P!'���}�����	��
�Z���U�R�N�����s�Uh�i\d�L����fa��o(���k��.�F���`��@�Q�Y����a�S{�'a�K/�t�}q��7��$�?���[���*����{B�
=������
7��f��U1W�=x��*��H,����@���a����F������'�����B3�E���x�v=+�����x��Ex���v0��@�SaY�����N��5��5�;.d��������A���������n��*>7�Q������SZB&��M�����z<��L���vzm�����kf��M�����:���V�w������?�F;?���@
��j.�B�
4*j��B��P��m3g��(x*����_�2��	�"[�xc�[�n�t�:����1�����*6��d������ch�X���9k��P�T���W!VE�����������o(��5k4����Y�_Z�9~��k��\j�������X�S�yj[x�������
������]KO����>P;��{��.��uZ;;V��:�����5�8���nxlj�����3��/
z�UtW�8�u�a�|E������W\��������T�i����q��Z���jP7�(P�fP^v�e�E�T������VX�����������u��q{���[V](�i9�Y�f
/!Q*
��R�p�v��
E��b�c��@-��We�R��q����4[7�M��#4�=�>�^C}�����~z�L����9_jQQ6.j�B��bf<�ZTL�r**����b�f�W��+{�o���#��_�����-��4�fv����3�C��<��P���Z�*���.�*:�p�����0�<�{_3����B������
��nU�VG�/#��j(��h���,Q���/����7,�P-n�"������"�����������]3Nu[<K��������:����i]������|�s��%P4�Y���O�E_Th����2���e����]�~��L���z�a��9�{������,_�P���A��:�:O�u>*���W�Y�*�*W��^bz�����B���+�U<N}�3�����]���~�w*��:�k7R��=���=��3��������c���������[�o:��<���������"~=���m5�G�5� �c��rO3���������P�u
)*n�T�+�tE�p��b��4�������
7��F��~�}�6����hcQ]����������y���m8�p~+;����x�t�p��
e�>�9�]�m���k=���:g=��}�c�����H����V�8F�������=R�?6��#���5��}0�s������^��T��������������7�x��cy=����������u����H(P�Se�,j�q�5�5j�q�5�5j�q�5�5j�q�5�5j�q����:::����J���VXsP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j@CP�4j������J:�n`��@���>Z`j����������>~?��:z(T�-��[m`����c��~c��:"r7���?�?R�y���@���|_������k������7t�b������o�k��}�>c����~u8R����c�{lE���~����]u��s��f�mVq<����jK�.jUI����.����W���^M�������X��������+�yt�A#>�����+l���f��������W^6t�b�kU��Z��sf�\Y��;��g�G�����@������:U���3|<(P�`�������.n�ZE�����f��1�e�Rqe�=��w�}wh�/�~�s��������t�*����h��/~qL��'�|������O?m��O�Tq<�
pK�,j���a�w���^��x����������5���z��5��n_�v�����v=O���������v��'���������)������7t���w~�N���6��"g�����6�dw[=�������*�
,:b��|����[$?��$|����~���k�����A���@m����l=
h*Hu�Qv�W���������nk�������K�p�����v���[������Wm�v�zh�l��T���s�qEo��1�->�@�M7���C�=�u{{{�})���C���w�Y�v:�u�]������#�<2���.pE���{���_|�K�UL�@]��
�3�����^�;���\q:�A���ME�z�@�?kF�mu�{I���y�q�����gT�g�Ga���^�>��j���x��E����5����o���n������ga?E�����m��t�������?����u����/��s���Wm�XF�b�:��c������g�}���=��3\���[�u�tvv��Eoo��[������-���������>7:����x��v�I'Y�4�����_����}�g�Z�~���k�{eB�&~RR�TT�����r��������;�8Wh��F�����_�����*���=��f����l�2��|����3�<sh����ux.�\�������g�������~�S�;������Szo���s�U��a���S��{?eU���sUM��}�=����;���oo�q��� w�h�F�����c�����l`>�`�O���N;�4\l��Z�U����BE�B;6F[�!mC�D\�o��q�E��(������V��3g�ZK3�U���X����7��f�����C[�����n�
�=�"u(������[�
�*h��}�e�U�R�%U4;Y�\{����w�a���N3�c�lA�����R��:i�xu�p4z}���x�[#�9�W��j}a4i���/����Q����������l�m���uxN�O�����%v����3�w\��W�Wz�C;�/���~�i?�����m�����-���=|n��W!�����>K����x$!��R�_V��%|^����c���^_z�;
�4X��* �H��P�G��/|�v����maY��z�-�o(���jB������*���+��f1��3�:�2���PJE�o�O����
h:_�}�TdUQO��i��U��SB��Z[[��x�)T�Q1Y��U��rA(��\�yk[(���#=WZ�:.����qa���kxv�^��
���ExnG+k6����7F�j
N���y<{��1�"@�����F�s��X�=c�b������=��,��WL��LVm�m����g>�����>3�����Y>�Ba����/���\h����;��.��Z�sSy��������z
t�����������y��vx:��~A"z/��W�G�/5�����/�?@_j�<��om����j��j��,X�
;���p!@�x��tQ?QAB?�E!������n��o�VQ<�_�x���?�p�5U�
�)��E�O���)S�?�������>:��[�,�T�B��u�
=�x���=:��F���R�C�C�:\ ���wm����s���������)~�����u?#���� �"{�<G����Y�*��������P��z
S���[�������_���"a�hK9�T4�u��f���v3��}�m�=BK��9��F���;z����3}a�v�yQ;��B���y�}���?��?��.|��}��W<�^��_�%��K�gf\��_�U��k��j���4�h�9����I�)�����~rM��@�k=���SXFy����zNS����G��kt��zi�P�^9�BqUTzB�(.�����>�M���~��Y�v	�
h*p��R]�P��W(n���.:���p����UEeeT�P�.���{��f=�t��s���?.��9��?=gqa%��i���-*��F+���^����.<GzMT�
��B�wm�����=?�(*6�@�S~���S�z��T4T�P�^�C��/H�%��?<�>���[��!Z��]��~���g���������^���*(�9���h�/	�\�}�\tN>[�/$Bh[���+��X-<�p��������;�u�����fq��������g)j��5
��uu�(���4c�,3
�5�W���Y�*xh6�����������!��P���WG8����Hm�3�����Z����E�R��`��
�#GR��T���^�T�&e�v����7��
�~�Kl������uN�i�e`���|��xVwJX�C�5���,���K���/B�����'"W�F���;��Yu��y��*"��TQ?~��\�f�@E��~��5D�gMx������k4a�vx��5���������}��B������%���z����7�|�����9��@
@��@��c(�-^�xx���q�@����~���*~�BT��P@��4K0?�G(LT��hFj���*b�k�����0{T3Z��bw���9��e/��E,,�_��O���ja)���#��M�=�T���Q}<l~����l���.<��l���K�R����Z�.b~���s,t^z�����m�������3u�����C��3E�����\
��s�������7�~�����O���I4{[��p����t������9a��s�=w�W ay������c����(P�`E-P�v�T������Y�j�"���_S�
3��O��}�3mz������t�7��
w_�*�PD�n�o�����"�Wk{����IE���>z�����xot��_������[���5�N��
��RR��h��O���[�����qT/�,eE���K�V��%�V��u*j�������`U�e>TP�~����<��0	���{��K��K��-<�A��Qn�v���/���B�g����w=��g�T�G�]��|h��^{�U�I=��~�g����5
V��
G*l|�+_q|E(��v*���������~u���;.>��T�-24�S�m�)���/��A�����B�+.��f�]r�%�G��om���I���}�k������f������2�\S��ss�1��o���&���FKn��;/~�ft�E������Pzl�k�z^�0k�?�/j�������`U�B����@�b�.���y���mz.V�R�u����-�O���v�}��Zh���@���*���b�{6|\~�����w�Z�����\T�
$���ox?���j��j%�?�|�-���ay�PV;�\:n���N���*`�6L�3r��A���*���V�p�*4���t��(Tx�~<�;�m7�9�]�C����P���:��������&��3{�l�<���G��*��p?������~����?/�fd����S�����"�WL�k����n4:/u����Sx�9W?����j=����bl����@���v�r"��PT�rXw_���dR��B���>�(jC�������"����3G���Hb��O�5
����*4�3k5`��>^2@4X�O��A�����G��1����!��S�c������V<S�5�U��;w������N�>���S���u��t�H��^����<��x�_�_Y�g��t?���~)*��6C����u�uZ�����u��Ox�9������H����L��^����W?o:^��7�ST�g|��>�_E��I����u��I���X}(P�`�,P�H3}�S���'mA=���V[�/�0�e�jd�k�F��T�h����x
��>�?tD�F�����.�5�M�>�b@K7�k�b��)����F�������U�{�?f�]CG�jlA���3��6>��C���u[go}~.�&Pqt���v��mA=h������RZ�����-���:��g�8���������������3��
�=AG��jL�p.�(���>�:)t�*�������"g����68����������w�����
S��@@���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���
A���?�g�}���~{kkk�R��s��������)cmk

���9s�g?��M�4�Z[[������_��_��	�����dM�y��`MB��CR��������_�E�@���w��]�p�
�����_|�n��f��g>c��������R;X�P��4�Y3�w�e���3��_��=��#688����|�i��7��M{��7�����5
���s���3g����#�L��;�8[k��l���C[���{���q��X�������x�bp��*Pk�fVo������1���={�+<_}��cnk"
�Q�@���e�o���w,��N�:�v+�����	� � � � � VKTO���P����C�z��e����AAAAA��P
ru�@��
����!�2j�����s���q�?AAAAA��� W
��HI�������������C[�3f���K/���k;X�P���F*Pk��_��_� ����m�=����?�����Z6�|�m��`MC��#�@�E�w�i'�������>�97�S���M�0���X�v���@��T��~V�Y���^zi�/���$�?"��~�H3�u{|�����cmk

����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@
h
����@

Tj{���4��c�Q��|�x�K6������c�c(P@�����>a��L�+��Y�@(P@�=9]�� ���54���Hos��jh
���!(PRt�����\�e��(P�
�u�
<�ex�SyD���1#o��(Pb�M���'���5:���������Q���@�q�D����rR����M����v�i-�sWm��j������y�=s~���]�ne�u�D;{�w��/l��m�m�s9����:'���q����h����T��'VSA�j
�=�f��K�=e�Y@�(P@�
�*��[����Mv�f�v�������/4�'�(>�X}�-���h�7o���o��~�l?�Z�+RWw�x����?6��	���W��"�I;��/��d�o��E�������S�+����^�b_�t����M��:�q�I�`,F���C0N�jh�T�c��-��o�c�k��W�����-�n��Mj��}��/i��}������[p�D{��[�;M�����vS�qG
o�_7�Z�Y[�~����O6�%�5�;�'�s�se���Y�}�����W�}���By���o\Q�����W���m��_������(?N=^�z�@�\Q��Hu<>��>��M���M���������&{}��>�����i�.^��fU_��}�&���Z����[l�������k
�'��l��\����r���o���M��g�h{���`�����#��\q���c����dn�qQ~�z��@�P�F�(P@�:���b��M���B�b�^����7�[����m�l?�J�}0cE������MvObY��P�{�o4������SZj
��Om���kq��������k��/X���CZ���d����6]��~��&;q����A�(��G�-��kz�q�j��54@��1o�D��M������k�,g-�����5���	O���~��&{����_���>��M�`o����At����Mt���u*�o�h���~����(�����	�����x���M��k����C�)>���PL��������
��jh�T�������Y-������6���P����W��<�����&��'���b���4���O������i-���jr3���X
�������='�Xoy�p�.�x��Z��K[��k'�����Y{4�g�q�M9���t<�b�����F*Pk���G���B�5�>5�q��iq;�}wy���z����������%�KZ-5�c��ZBr�����}���vn�����1h�.���Q��Hu<4C���[�{��dl��f&��~���?����V�U���M��7��������:��v����>����B3��v��e�������@������b�M�������w�Y�M����&�c����
���<�-�!5NT����[��p���8q��8K�F��B�W�^}X�}�s~�v����_�i�Pu1{������������>����v�m�����w��7�3������F� �����o����5m��m�U-v������5�S���5�Z�m����}����������������h}�8T��h�������KV��@��]'����C�T�&=wN�}7���eUm:@1�y�[�Cj�8�r_�����Pc2��u���l�h�E-���L;���]SH��\�b�m�dkU�4J��C�������/���zZ�R����:��v*L���&�u�fk���~'��5Pg/�;hw���E��"-��8qG�-���+�?}^�}��Mv��+:�]��K�e5���#����!�.w~��&���1���3+�}�w<��~�6���V��N��_m�C�l�Y�Z���5��T,A���i���5����8����8��3[�S�d��
�4�H����g����M����k�]S���m���� =�r{*t�i�����b������fY�<�|��m4����[���0�^�Z6D�81o��:�2�������8�3��c�cF�T�C?�����7����Ni���	v�	�������&;k����6B��1�-�q�v���z�@�/?`������O�m����w����|�u�����+~2���w���N���������S���@=��������_���_������[����O��.V�����:����5����<��b�����:��/|��c�]<_��>��h���������5��+'��5P'�M���:�G�tdz������x<q����N�j��Omq��o:����V�������Z���r�AU<}����^V�rVg������k}����|Q9�[�����s���i����m�r�D��?=��O�/nq��uL����&�
����yF���^���������B�EGn�������Mv�-�]X�'��hQ}��Th��[�mqk/��R��"���W�:D���O����X
�������8Q����8���~������k��_�>��-�e���Q�_��fWhV�w�n��mv�4��>v������`u��� �z�|��/U�aG~�����C'�j�N�~�/Y��!��Q)��X~�D������.����4�N��rV�7���Q��hM�_|��]�������WD~����[����Hu<�?�����	n9����&w.q���fW�V����h��; �������&[����9�~|�uV] QA��X�,/�����]/���������\V�_o�0�V��E��hv�[��<@��x���5������\�M�=vZ�h���2F:�>�t��vG������4�5����8�w�j}���L_�������o��'W+�zH�5����
��[�i��r�������������o��o���x������#���T�H3�u��8q��q�	-�/7�]�1���C����?����=���Q�����z���B��R�������9��f�3����W\V'E��n��-�X������*-���C���B�Y��t�m��lo�>��M�����hI=��M~u��(�W��=.�����ci"=N=^=n���e
@N���]Q^���Es��U�6��	o9�<:���q�9m�+$kv�H�
-;���k7�������|_:n�N�~v)�v����]-������>O��������H��/��!5NT?����������-������,�������b��]��^�.��r�V��}!�u�!����T�z���>�f}����W�OMn�s������}K|�j�N(P#6R�:���Q������5=��Q�:��K������a��WMt���o�h_ZO�������J���sZl�W}����W�j�[N����!���b�|����Rm��R�D�h������Z2C_���wM����]�F�^����f���t����e4���dO���s�����~�%8��T�z���	U�`�fZk�~�_��k���?5�s�����Q���5b�Q������
=n���e7cy�f��O<Z��9����qhv�)���I�L*to��f���~��e��m�@�����lrn�2 j�����
e������ ���[����a����H�6���8�������o����'���rI��AZ�L�����������RS�H�;���~�F��>�/������K[��PO�T�$���>wO��Z���
��y�@
�	j�(P�R��K�Vu.�x�%2�Z����*�P=y�����R@���PqY���*�,5��L�m~�d�����:��}�6��'������m5���h���9x���d_(�n��/>��/���9�~��f;j�f���?�W����9�Cj�����)��o���������@�����?��{�j��.���O5�y{��9�g�����c�_����i-��a�C���]��&w�����'��5P'#������{�����=������o����M�L�\<�����>{��^m���;��:~v|��|[���B����_��?\�U�F�r�}����<T>�y���}�vq�6���O{4}!H
��(P�R��K�V*�5kYW{�O=uQ������l�A���<�I��"�Z��v��wM�M�0+���mq��^�fEk��Oa���uA���Nm��:��wX���?�dW��q�U~�i�R;��.�P�q��+�/�w.�a�?���=������r_j��6�_���S'�Z��d���l���iy��Mjv�g]��}0c��"��#���K�m���8t=��ok�v���z
�mJ�=zV���[��D�_��3t���81o��:I��:��}}��u�l���=3g��_2hm%��������[���[��w������c)P��}���6��dK��~��A���~W�����J�;���J���[��d�q�}#��>��^k/�#:�T
��(P�R��K�V*��~�G�������}������i�y��Oj����/6��{�^�:T<vKplP����{���}���b�.�������b��Npk"�h��_4�F?hv���N�(�tQ��=@=������p�_�Y��*Lk�2�d�)-����8�Le���r������On�����[WN���N�����W�5��z
M8��7\�[���������]�z�%�'��5P'���{{���\�x�K�]���z|a�}���jE�Y�����0}��=��%]������'�|��+��e]%{n��8�������:���9a��*Bw�������s9��^k/����A�kjm!\3�g��w��l�D�zUP�F��9�@��\�E	U��`����Ld�f-o��&;w���;4 �d??���K+oKE(P�U���ZJ�G_�3��o�C3{���6��U(P+�gx�yb��h���c��+�a�q��Q{��W��f����V�e�
�Z���r�G��^w����������H�b���,j�o�}(tn�.��?_�W���n���y�@
�I�@}���60hv�3���~~b��������]04sy�3:��w����g��O��<�h���]n��fE���c��}����e3W,���������?`6�������~[�]����m(P�jA*�s	����?�"��w?:��g������+g7+�������:����BE�-~��f�T�?c�f7�'����Go;�r���^���������l��E���!q����/4�������8��@
�I�@}�3}n�����(P+^xg�z���`���1������Ksh
����;f������q��������n��~�����l�KW,�����	}�������np���*���kzlA{�^�`��~���*���"H�����z��WV|�����k��EZ���������vqhI�So�������-��D�����{G��8����|�p����o����X������=���\Q��t�_���U`���qD�}�����Q.��f����_��&7k(l_t�D��$�f������G�+���|E;������=�M����b���	�j�����n�u�����=@=0ND�(Pu�*Pk9���%{�u����eN�����vk�-��vS�`�V��^�Yo�bug��+�
�SF.��9��J%s�Vk}k�-���=%W�Q�H�^�R}f��^y�S��[m�6��~�����g������@�j�x�R�����J��K�E/�?��Y0��A;`z�g��Wv����
}��[:��_����g����C�i��������E������p����?������Om?�>���S��]q���??�y��,��$>^���e�<t�M����n�K�kq|������m��`S���������d�����B��u���]����[�g_k�-'5����CK~|�?���W��~���6��n���vZ�c��-���4��hv��v����k5�/����O�}��-�~y�}�Oe��8��@
�I�@��S:�lBw������dfN��wx9-�"�
0���c�]���*���Kk�T����!TX�7��~Wv�u�5kP�����~^��8���L���z��{����+��k�l�t�����/�H�z���@T����z����-��r���Hk���Z"���V|~���O�9����2F�"M�7?{�u����gK��;_����}<�������>�N���._�\�m���i[������RN_�������. ��J��k���![���M��/6������l�f"��9���	���#�wx���nVv<Z�>pZ�[7�'�����k7����mi4C;\�G���z���&���~��9W�h�u��;��]E~�����E�����C��D��5P'��w�3o
�%�>�������d�<��
5jw�����pkT����K�w���a[u�h�����o�;��>�/o�u_�9�F?�{���[�CE�{_�����+P���o��e]�f�\��������WAu.�zY��m~���x�������?���W��Y��"��W�m��F/J�����C_�mx�������5�e������K��Y�n���������\Py<-S�Y��i�j��Y�f]�P�������������jw�I-5�R����{��k��a���������}T�	�tu�{Oi���������V�5K��+|;����^(�z`��\Q��$U���=�U����.7P��tADh����
��7�����O��m�
NK�T4V�XE�x��3]}����[��5#r�������.������=}�g�:���9km�P,�@���x��sY��k�!}>h�r�����D��

����Z�Z��?������v��}���]uh�}�l�M!69���[2h���5:�����. ������H���6���f^ky��x�I�eG����"�oS�Vu.���8��@
�I�@�j��f���y�|U���Z�U����F�Nk�j��f������,h]���g�g����E���sJ�+|+�����E�o����j��������/{o��e���������k�R��E�E��b��:]X������;\�eO�����g����d��A���
������)���d�F�P�.����K3����feq���9��^���>��Z0�
�����x��}n]�P���g���sB���T�[���6j��9�����j�NRjSz��������g������[S�O���U^q��}��`����q��}��(�bx� �������Z6n�~��������mpz��}E�+@k)�8��~�5�Qk�j���8�k��@�yb�K����g����;n�����<�Z_:,�/��Y����"�������[w�f`�1�/2T�����}	���]v����������M��i}n�~��=n�}�u>��-��~�-��va{�J�s.������z9�+�'"W��:I��}����{c��]6�����R���
''���&���fSk��f!j��~�����I��v�\�c��o�>�gj��gu��f<j
k�������5K:��f=j�����;g������}X�����'w�6��BSZ�Z%�����.w��pL�Zt<Pq����n&t9����hZ�Y��P}��e7T����a_}��\���9}��q�J�Z�~��F3���.������fZk_�[��6���{>����n-�����U,�1���S�:�����]���!�X��Q�q"rE���T�z�s:��g��u�/����5�u�D�Th	m~����o��%WR���Jv�+��sO�+�hF��gv
����*lk�b��9��[{*.��Y��i�6
�l��k�+���5�WAu.k	 -����RB�8o_-��/��~k�}�
�
�w|�{^�wKL{4���>/TL��<��n�o}~����m��/�4{;���Y���R ����t�]�Vt�k�q�y�R��K��h6�~�Z�Y�C�����'"W��:I�����~v�5�Ut��*���[O�r3�C;����:�~���Z_5l���*$����.v�����BJ��
���l������B?��>;_R[�Q�
����h����E��V���"��e}q�_}���@�W`���_fT��Q�Wo�����3|�_�u�����8������w���-���4��K��k���_r�mo|����
�����(�f�t��B�n��W�V*�s	V�k�y��5=t�Tb��\Q��d�uQ]���v�-��w��w��"�X��ct<P��.8����2B
���_g������P!��gW������aW���������6o������a����REZ"��CKz���*����Nn|����/�5���8�)��fP���zL�6�U��9��D�'"W��:���i6q��?��k}5��c�gR��@T��~��vk�-�q�s�n����Y�Z#��?�z��������*����>���[��7�%��t�����~�q�������.����Z�HK]6s�g�����sy�|N�?���9���~��_v��j�*D������tK�U��
G�I-jU�rN���81���j�N�xx]�NKps���c�x��sY3�5�������.��hhM�����.�����%������]�/�������t�N3�53��W�vV!<����/._���l���kh����G���%����V;��>bFO����*��������R��K(�R�|+-��J������Y~�(���1ND�(PuB������d�6��c�x�R����w����|�t������k9k]{���e?�Nk?k����n��u���}��[s?��}���u��6
]�q�)+�C��������7��s�~����Z�|�%W��)x�'6���������g���q�'"�8��@
�	��G�E���\�R��K(����J���z�(���1ND�(PuB���@���@��9�@�T>������J�����bi���:_�D��(�������%�tA����U�{h��M����F���^S������~������M>��nz��f��C��8���]�m�oKo�����@]�q"rE��
�j��� ���j��9�P\���%P�q����pu���d����m����x��~v�/>�pQ�=��w
���
�vj�k��b�#-i��Gk?���7�`�_�:����8>g������:N|\��0w��yK+C���*(P�b��\Q����G�����"H�s.�Z�|�%W*�s	Cj�x�=��UB����^�vv��7�X���-X:����v��sO�/&�ri��M���nwA��]%�O���:��]�C�Uh?��K���-�9w�8��o�q�Va��{�}i'>�
O�[����/���S�+B���

��'"W��:�@�Q���x�R��K�V*�s	����\��'��l�
��\�7l��_zw�z������s����}�m+
���������=T���hq��=n��
�a����g�=���G^�c?=>�������]x�����k1ND�(PuB���@�x�M���+����������������u�_T�[������:�\h�����]*����u�t���m�ukw{�M3l�6��=��������[���*(P���Q�|�%P+�����J�t.�bH���s�l��\��/�3h=��fX�o�J���n������Z�yU
�!T���3�����#��qK�h	-����)�����U�O���*��vw������l�u-���j�N(P{��
�=3��R�R����>��
|��
.i��k���u���-������Ll[b���v�����,���n�����9v���f}/=_����_{�������}u_���k�7\c���d�K�lp�<k�g���U�������=�J����TA��:����\�R��K(�TN�(��8�O7��u��n�C��vk@�z{�-����m=�v��z�t��w�����/)�S���Th����{����8t�E��u�l��������y��u��B�
�����*��_:h��?���V��������=
��'"W��:�@��
���j��N�������[���~��6����������uM�b��n�y�ak�������v��w�������9v����Y����o���v�:��E�j�u�k��`{�����+~.ZXS���������+�e�[n{�E�T�WP��A��8R��KT�g��|����5>��o��[-�����J�t.�bH��;���|���\��p��1=o���k+�����e<��{M���^����pEd�w��eA�v��fMk=k������FJ%��c�xK��xq�[�ZTT���gV��>�����^W����.�kj�=;g������BP���8��@
�	j/U��60`�7_7�m������g���X�g�v������O��p��k�z�����:�Vh6�����f���7�}�>���������|?7����������D�Z�t���}���:����b%������

�>�xG*�s�X��Wl�����m7q��X�C�S�W�;���\@q�rZ���-v�v����l�u����[l���6}wO��.o�vh���jrm������m���b�~�i��u�f{����}t�7�����y�|i�mq�����'���*6�B������n��f1k��
������n{����fN//����~�����u*�k����V�t���~w�3g�/���.���W��o��Zn$.p�p�_"������

��'"W��:�@��
�]7][�?�`MQ�����l��Ns/�n3k�c;k��g�z{�J���Y�����fa+���m�>��l�����^�o��-��_VD3�S��{�hK��Z7�\#����]��k����}��(�T>���~��&�w,�����J����[l��7���Q�o�#�n����dn�l�nX���[l�6���h���ivm7�a�mPn���#��<w�;f*~��M���4���|���'����[5���d��D�=tF����.Cj�x�S}�YB=4{��VVYKi����+"�v{�>�:���p�o��:^xg���������W�>�gO���c�������pU�;�w[�7��\���.{o��-ZV����,~S���8��@
�	j/U�n?�+-]��p^z�>n�e��~b������f���l�O?�����Z���%��K��-�1��[��o�mF*P�BK��5�/��C�����:����\"�/���9������9�P\���{�D;m�f������SZ��k'��WO��5�}��n;�}�o�h�l�l�����Nq��-�������wV;D��m���x��[��Mv�n-�s�o{�I-����,�f��&����d�^[xk�8P�q�f%kv�wT��e8��|��/����z��-�����n`P��+�����y�����nv�I�������X��b���kV����	���W��W����T4��#���q"rE��
�^�@�Y��W]f��K��n���y���.?�8�tF��m����;����l`�6X��g�d�~���v���~k}�=m����9����1��fRk��RW�["$U��@���Gq��9��U�rNK�s.������������]�����WM��~��-����������&;m��*&+Zo�h���l;������b�X��#����/Z����'��C[��Z���[��L���.C�@�����|����i�����/����oX[g�.������|�;^Xy����z�l��M��>��[����� ����Y_������.�����fo�K��1h�]@Qkk��

��'"W��:�@��
�*�>��+����5�}�i7���l�u�t��n�NE{��������Ziy�����,�1R�b�}wZ���z���Z7]7�N1��f|w^{�;��g���[m�lG����H�s.K�s.K�s.����YE�]�k��l�����Lt��O����n�hw��bs���_�7����l����~���d�m�l�������8/_�b��/���X���bH�g<�g=�~��K������i������-��+T��g��\����q�n������������|M���R�M��~v��"���~����+���7�=�5��|>j�eCt_ZSZ�r�m�>6:��^zw����������ko,t����>v<N��8��@
�I���K��T�����V������[���������{�y��������!4;y�v��v{����^����U=Z�8������������CB��@���W\l�m����������v

�>�xG*�s�X*�s�X*�s	�U��Z���C[����dw����P���?��w��d��5z�W�����&�����U�}������d3GY7z��g
�k��8q�s������uY�Z��\azq�_](Q���L����t�y�X���N��^T�s����|N��U$V�{�3W,r�M=�>�L��'W.�.n��k����9��t�*�������m����d/(�����mo~���k|L�Z��+
�@�P��R����X�����yb�������>�y�^�w��gZ�����[���r���r��+�W�+&_�g:�=��+p����������[�������;��T���:����\"���\"���\@q�rZ(<|+��/��d��f����&�d�f��tj��+[l�I~y�����B3��������f7�:�FA�:O#�{N��{e�rM�zm�|u��b�2�����v�n/��cT��v??q�v�]�����cW3*R8�|?���}�X����1v������?n�u-���j�N(P{���V������R�]3�mp�-������`������vw���u�ys��8T����l_L���3���@���cl��wm������G-N+(P���Q�|�%b�|�%b�|�%W*�g]�b��^�m��&;�w�v�����w�l���]�:���&�e=6�Q�[���t�T<y^�}�_���#G��u�'"�8��@
�	/U���q�+P����u\r�-?�$��v�
.�g��m����]���Z������e�N���^<����C]���m=��n]�/w�ims3�.�R��y��������j��V�T�^��C����6�|���c��������l��~S�jt<�#���D,��8��x���}�����z���n��}���.Z����������l���%��,�������m���V��u�T�y������R�vU�%���\@qU��f0��Q��������-�����r�j�m��&�`��H����*b���&����Y�����m����~��&{��t����81���j�N�xx��
�]��`������w3����fK�n0����Wn���y����oW;�g�+8���'���sOukZk�t�b�K��J%+�t�������m���>�#�{���{�R�+|�s�k?l��}(P���Q�|�%bq�p��x�m\�f}/>o}��h��K���m����[��&�q���=��;�3���yk,�����3M�c�������R_������p���C���|�s���9�P\������������+g3���������l�M+�ky�Swn�_~��n;��]d1�6�x����������z��6!(P��q"b��+
�@����Rj��-�wERT�]z�>n�_(N����n���v���������������n��?����K��Z���1Bh��mm{�!y����C���}��(�T>��8�����K]]~]���r_p����������u�m�������8�J����xc���gX�����������}�~M>��C�����C�K,��������7�M�o}�������@�~�D���M���W��y�D�r`����Mv�a��\��L��vo�}��^��n�<1ND�q"rE��:�H���:����\"V��*F�<8�+N�����.������m��.���.����f]�O7]�uU���k`�������.P�K��%�����G]���K,��������e�N�?n�l_�l�M=���������b'��l_��	v�~��b��-��k�6k���l��/�-�c�v\��e���!�^�b���dGn�l]wT��

�yb���D��5P't<<
�A*�s�Xu.�@=0�
.�_q�����fU���a�����f\��;�\����M~n��<�/{���8]]�n?� +-_�����v�&��wY�%������9�X*�s	�����/m�<�a���+M�����f;{�f�`�o�e>6[�����5����&��j��c}�X�i��._Q0���)6�����;k�Br
�yb���D��5P't<<
�A*�s�Xu./��Z���CV�����f�"���6���v�:w���f��)Pw\p�
.[f=�mK�#Y��8�L��+utX��7���W�kVu�uW
_,6
��(�����cn���f��������TN�x������i-v��-v�Q-��9-�z��6Z�����W6r�5��}}J��}R��� !^����9���o��>Rh�;Oh����o1�v(���1ND�(PuB���@���@��9����Y3�Ul[l�O?����E{~�o�Ar�U-P�>��a�����PQ:U�^����s�E�������m�mc��N�R�|t!��#��8�����E��-�������5-�8�x)R��r:�@10ND�q"rE��:j��� ���D,����9�-�1����^x��^z���[������J��*j�z���+uvX�5W���T�V�B�������#��A��qeE[�U����J���z�����\��81���j�N�xx�=:(�T>���\n��60�-W(^v���������uN����~������X���c��������n�hm�@�
�m` y�W-P,����=n�J�t.�b`���D��5P't<<
�A*�s�XM.O>�����^~�����������]�p�A{V��X�u�S����Z����{]��z�J==.���Nvm�O>�{���T���k�J��?�@��r:�@�TN�(���t�i]{#��c��8��@
�	��G�E���\"V�����.����� �,j���
�{T��X�u�7[��g*��YV��r����f�k�u�uAD����p��)��X�E�oA�z�����\�R9�K�'��u�Tw-����(��c�c�1ND�(PuB���@���@��9��U���m7��w�Zi�2���[~��nM��'u���o�o���o�������6k?do���m6�%;��"������.���b���pkU��y�:.>���y�u�������_�|��+�KA�z�����\�R9�K�'z�M�_���#kr�1�3�'"W��:���Q���x�R��K�j�y�o���w�����6�`^9��Y���=dm{nW�O9F+
k��Rw�u^rn�m!FZ��u�_��7��wK��/�O�\t~����7>�����%���j�r:�@10N��u�%�����j�N�xx�=:(�T>��T>k�������C���#�w����l�v�%�
�dn?|_[���5����~X�����v�Y�Z�ZQ=C{�f���}w)?>����_�Fv��K�T>�(�TN�����\��8�c��1ND�(PuB�������@��9����9����9�@�T��%��������oG^�3���{����m������l���+���_��ig��kw�����<�o�\3������9h�=�����|N7>�o��6z�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(��|V�y�;5�B9���lYw�����z�������d/�;`o,����=���mwaW��C���N��H�-XZ��m����~0�����A;"*��������6X2��-��x�m�����{|q���A��3��������R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(��|��y]��%���rG������������9�6��A��)����l�K�l�)�6��W�����c�8��[�^�ym%;������n�u�����7`�����M����{�K5�������J���d����A��(P���|�)P�=���D��5P't<<:A*�s�X*�s�X*�s	K*�S����}%���/<>���5�z�3;���~G�ut��lj���8T��,��f�O�]���;{�}��q{���u�[�c�Y4h�����=�g����w��O�[��#�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�tu�vy��[:�����|�t����Zo��ss+g;�pQ�+ k&��������.������k���>���<�����~[�]�+����j��=��6:��/��G��CJ�t.�b`��U���������z�{�����������:.:�������;����r��N<�z��������k�����o���=�nW������m]3���{ow�E�)�=�D��F��:j��� ���D,���D,����%���q�3�=mv��}�3���sO���T�v�����v��+&u}z-�T�}������>Q�<���V����l��;��8(Px���%P���q��?�l�O>j}�����9o�T����#���u�c]7^c��[m`���M��slpi�+�x�:�o���4h�K����W�W]nK���z���<������������������t>�e�o�V>�b������1U���q���y�@
�	��G�E���\"���\"���\����8t���
�5���b��������K6i���tu��u�V�@}�]�������y+fVoyn�=7w�����|�?��'�����q�W=Nl]���d�-m��[UD�^�[�+/Z�����w�k���=mp�"W�^z�>���
�=�J�m����+�]=�>h����z�-�o��X����h�������9E��Y������>|�e'i����[o��n�{�X����
�8����Q�����G�����"H�s.K�s.K�s.�bI�t�8�f5k��_��bM�_���.f�;`���v�]�v��6��A�+�W�z���K|T�
��.���cn�,:Oy��:z��������S�^}��9�@10N��:N\z�6����Jn�t]����s��z����m����m���<p���u����������e�~����G�%�ons�����m����
f}�=]>�E�~��n6u�~��8����Q�����G�����"H�s.K�s.K�s.�bI�t�^~��Aw1��n�-�qy������(����}�6h�����%{o��mzV���PqZT��i�k��I+���\���yoq�v�tE�������\��8��8������m]�]5���������������s{y�5X�6�o�n��.�}�1����r����n����!:.9�|.]���=n������[ZDK~t�~�;���u�t[��.5�P0N��81o��:���Q���x�R��K�R��K�R��K�XR9b����Y��<������8����v�a�v��������|���x���z�#�.�8��A�$�����W��
�*4/�.���������d�f��7hW?^�^�������\��8��8�m�mm��wl��wm�[����6��{6�h�-���~��6��\�����q����>(X�����������\����������o��������:px{������`�R�����4��;�[��g�m�m*��`���qb�(PuB���@���@��9����9����9�@��rZ�b����L���[�l��]��������6!v����~k��u�����l����B�g����o����w�P[��@���r:�@10N��2N��z�YO�����R������g�u���k=�k�o���;��h����L����j�W^jK���������u��f\��o��'��������o�aV*�������]A�m����g�2$�IG�8����Q�^
z{{��.�M6�d��9s�P�rGt��dC������G�����"H�s.K�s.K�s.�bI����[z]���Z���xb�)�v���(��~y��^��M�\��n�cV��������]�x�mW�/2_�D�;F:�.������C��Jv��������P���R9�K�'z+'.�ns���`�bk?|���Z7��+Z�`<���������y�n7�9n_�~���n������u\~�+P���RE[�[g��'�.�xz�m����O����_�`x�.���@R�c���qb�(P�]]]������	F��S��6�����mtk:j��� ���D,���D,����%��?;~���f�=Y��FZ�c���-ZVr:T�m���/�7h�}f?��u�]~��R�lqG��_:n|�8X�z�������8�[�8��b���m�O<R3#Z3�K���rm������������n{������
.]�.�X}���Zg�u3���@������K�z�o&U��8����Q�^
��'��i��U����o���ak������k��fH|��������1t,��xx�=:(�T>��T>��T>�(�TNk������#��q�v������.w��#ft���}6K���Z�:�6��c���5��.�7R����	q�U�vx��R����
�J�t.�b`���6Nt�=�����/�e�[s�f+�@�r��>���_�U�=������U������lYM�Z�[�>|��
�9����n���y����K��JK�l��9������q���y�@]GW^y�}��_�o�q�����f�����w�y�o���xx�=:(�T>��T>��T>�(�TN�����\��8�m��������Kq�+P��;/9O����x{��f6���+>��]WOu;�������l]�.����V��Xv�a���9��������.q�R}<�D��F��N4c�_��t�A���>��l��9�h}������k�n����~x��t<<
�A*�s�X*�s�X*�s	K*�s	�J�t.�b`���4N�z��O=f��n��zq�����[�����c�N�e��
��o���wN��i�
-��9���b���p����%�y��������������}u�y�+4W_�1�
��=m�=����������t��)'�`��7
�u�e<T|��/����V�����������~����O|bx��o~��6c���VX���(P{t<P�|�%b�|�%b�|�%P,���%P+�����q�7�8�����1�};9cY�u����f�?�����o����]7\=|�����j�����lk��~���a�N<�m\��b�����O������66��{n�u��j�c��[��������i�\:�\���8����Q���W^y�>��O�QGe===C[�p���v�����
�'Ov��_�����D����(P{t<P�|�%b�|�%b�|�%P,���%P+�����q�7�8q��7�������f'}3�F�x�_��7��������������m����-��������E���������b��1CE�f�t�i�']}���X��o]��x��<��q���y�@]��\={Z������7P[kR����?���n�:���n{����q��� �?�B�=<��~B��9�{xZ��3�?�B�=<���T>������\"~R��K��1�#���D�s���i�~��@��G�q��V?���%��b|�D�<0N������ �_��:P�^��/_n[l������5��G3k�,���G�:u����-[���x�
b���7�O�O9��c�������r����EO����r���b|G*�s��yH�s.?�|�%�������\"~���{�_���|=��k\���w�<�r:���b��D�<0N������ �_��:P�^�^x���tj=i�M}�E����6�fP�y��C[F���g]]]�8���L�O9��c�C��g'���C�������C��S�!����@��H�s.?�|�%��!���D�<�?R9�K�����7J����?���%�����8�?����8�2?���A��W3�1����]m``�N9����?m�\s��Vs�No���nYM������<���X[E���\"���\"���\����\"�p���j]����c���%P�=���D���j�����={�����T�k��o���u�]���N]T�E�������@��9����9����9�@��r:���r:���r:�@10N�'z��+
������s�=��#����w�a?���l��	�q���V\8�G�������@��9����9����9�@��r:���r:���r:�@10N�'z��+
�����x���]��G�M�2e8.\8t+�t<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8��@
�	���G�E���\"���\"���\����\"���\"���\��8�c��1ND�(PuB�������@��9����9����9�@��r:��U�s��gY����w�l���7��s�E�v��`m�o[q�T�n��-;�h���*k?l�d��_�e�Gd]3�����J��c�A{Y��7�z�TN�(���D�q"rE��:��� ���D,���D,����%���D�:�U�{��d��//�2[v�a��������m`�X08���c������q,�qK���,�y�^x�]+uvX����[��F����������[V�����n�i����O>�5����8�8b���%P�=���D��5P't<<:A*�s�X*�s�X*�s	K*�s�Xu>/�fck�m�����R+utX�+/���.��F68�\8������������v����l]h�������}�9���v�3�����}�.��7^sE���n�i���N����V���@=�@10N�'z��+
�@�����xxt<P�|�%b�|�%b�|�%P,���%b��NE�}w����W\<�m�����������p�7+��,�o��:�^l��e=��5r�����k��n6v�
��v�����{w>�s����y��(���D�q"rE��:��� ���D,���D,����%���D,����d�������{o���6��~��n�������������d��9��i�fj�n�S��*8�T�^���x�_����������6�S��K��l
�+�D�q��8��@
�	���G�E���\"���\"���\����\"�������j+uuY�M3*fJwL>����-���������r�WOq��8#�h�8VV�n?�\���^|�o�>�1��q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�tK���+������;W�����5}�u\t�[sz��;Z��XiY�k�~�>�G���@����Y�3O�`�[v��n����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t�S/�Rw��<��-Z�5��n�s[���������s3��vZE����(Pw��O��e���}���_:�����8�c��1ND�(PuB�������@��9����9����9�@��r:���r:�f%��~���^v���6�PaZ�������Z�n�kW�x�]k�c�����(���D�q"rE��:��� ���D,���D,����%���D,��!��{�[S�����g%W�~�Iny�����b�����`�
�]�\�����Z��wX����(-_^�������vI�>�X*�s	�D�q��8��@
�	���G�E���\"���\"���\����\"��i�
�}�=�.��q��6�?�.��Bv����:���3V����K��h?R|��������k��|�������W����*�Q�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN+��p���-��������*�������o�n�'�n�N9��n��-	���k�d��|�����[����^s�G-P/�fk�}��x�_`/������GK�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9��g���Gt���n�:�F��u���m`�6�p�[zpq�+N��p�-��M������;/>����Z�)X�zl�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9��������c�%�n�n3����-=do[v�����#��O���?�l�~���6K��k��+�
���6���d��Y�q�v�S�����~�>�|t�����TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���j�N�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"rE��:��� ���D,���D,����%���D,���D,�����q��8�c��\Q�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��:�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����,P�~��������SO=5�(:��� ���D,���D,����%���D,���D,�����q��8�c��\��u__�]u�U�����&L�Pk����]l�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5.
�,�c�9�>����g?�Y;��c���O�	m�����7l��v�7�|s���G�������@��9����9����9�@��r:���r:���r:�@10N�'z������l�o�[�k��\��n�����[+i�nW�-���6�`��@a����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D������Wl>���l���C[�F����_���wt<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�jh�Z3��x�	������^|��g\�	���G�E���\"���\"���\����\"���\"���\��8�c��1ND���E�;����O��.���}��]�����������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����*Pk=i]q������.{�����&Lp��f�i���G�E���\"���\"���\����\"���\"���\��8�c��1ND��M�Z3��8������fJ�@}����������h'�x��Mm��j�H�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
�K�.�
7���?�x�&����o�&M�_����`��M����EC�������@��9����9����9�@��r:���r:���r:�@10N�'z���qS�nmmu��O?��}��w�_��_�\�����{���nj��-sm�"�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����)Pwww�;�`���7]!��?��}�����_��E�u�����ix�5P4t<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�WI�9s�[gZD����;�<y�+\?��s��-���^}����@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j\�53��{��SN9��L�bK�,q�����8�"�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����*Pk2:��� ���D,���D,����%���D,���D,�����q��8�c��\5�@��w�y���{�9�^�EC�������@��9����9����9�@��r:���r:���r:�@10N�'z�����[[[m��In�����k?�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r���.�8c�;���]�����b��o��������=�\�P4t<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�7kP���������������+C[W��=����Q[�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
�K�.�
7���>���-�N?�t�Fm���������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����)P/[��6�tS[g�u��g�����}�[�rm�(:��� ���D,���D,����%���D,���D,�����q��8�c��\��u�]q�n�i���{���6��6j
���G�E���\"���\"���\����\"���\"���\��8�c��1ND��M�Z/^l�_~����?�	&T���6�����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@<��cv�I'U��EF�������@��9����9����9�@��r:���r:���r:�@10N�'z���qY��Dt<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�w�������g�����������@q����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j\�U�����]��t�&M�����=���������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����)PkV���{��g>c�~��p�	����k���kh/�8�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
�����ph���xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
������&��g�1�X�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j�����m��������9s��k:��� ���D,���D,����%���D,���D,�����q��8�c��\��5����_��_�f�mf���o2.��B�(:��� ���D,���D,����%���D,���D,�����q��8�c��\��5�'L�0j���EC�������@��9����9����9�@��r:���r:���r:�@10N�'z���qS������j;���G
��"�(":��� ���D,���D,����%���D,���D,�����q��8�c��\��5���������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����+P����u�]g���_Eh�n����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@�p�B;���K_�R�������(":��� ���D,���D,����%���D,���D,�����q��8�c��\���������k�d�M�����m�mj���G�E���\"���\"���\����\"���\"���\��8�c��1ND��M�������f�x�����^�����6�Q[�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
����6i�$�:u���Z�Mm�(:��� ���D,���D,����%���D,���D,�����q��8�c��\��uWW�m��v�����s�=7�um�mj��@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j\�A}�
7�'>�	W����+B�t��ED�������@��9����9����9�@��r:���r:���r:�@10N�'z���qU�^�x��w�y���}�&L�P?��O���.rm�"�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����*PK��~����?��"~���@1����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j�����k_|�uvvm��OWo����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@���o�[����e�����V�3�8�>��O�{���ED�������@��9����9����9�@��r:���r:���r:�@10N�'z���qS�����3�<�]��c������[�f��i{����Mm�(:��� ���D,���D,����%���D,���D,�����q��8�c��\��u{{�m��F��.�T�m�mj��@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j��U��4i�[�c$�MmRl`��������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����)Pwuu�v�mg�������������6�Q[�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5�.�x��w��?�i���~etPEh�nS���xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5�
��}��W���c�0aBEh�nc�4����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@<��Cv�QGU��EF�������@��9����9����9�@��r:���r:���r:�@10N�'z���qY��Dt<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�������>�`�8bgg�M�>�-Z4�(:��� ���D,���D,����%���D,���D,�����q��8�c��\���
�g�u�}���^{z�������.����;��o�=�(:��� ���D,���D,����%���D,���D,�����q��8�c��\�����^k�7c�n��p���O}��ua��O?���o������������G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@�e<T��`�
��w�q��'M��
���w�yg�x������m����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@�*H���g����6�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5n
�Z�������zn�����Y��?��k��@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j\�A}�w�u���j+�g�}�_��_\���Cu�j
����n�5P,t<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�Wj�������G6a�������'��
���G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@<���v�GT��EF�������@��9����9����9�@��r:���r:���r:�@10N�'z���qY��Dt<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�w�x�n��&�������s������G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W��@=k�,��~`�o��[�����v�u���j���G�E���\"���\"���\����\"���\"���\��8�c��1ND��M�Z3��9�����/����]�����������]w��>�`w�����@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j�����m��7.@��7�~���������-P4t<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�7���V�4i��y����������o����Vq�������?R�Fa����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D�j��U��v�m��?��~����������������Z�������%>PTt<<:A*�s�X*�s�X*�s	K*�s�X*�s�X*�s	�D�q��8�WI���{�+_��� �fO�z���p�������[o=.����������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=�����*Pkf�M7�d�v��s�9�h�"�]��o��(2:��� ���D,���D,����%���D,���D,�����q��8�c��\��5�&�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���UC�]]]6u�T{����[�O�j��@�����xxt<P�|�%b�|�%b�|�%P,���%b���%b���%P�=���D������V�4i�y�������G�Q[�h�xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r���.�x����<0���z���6j
���G�E���\"���\"���\����\"���\"���\��8�c��1ND�X��:��� ���D,���D,����%���D,���D,�����q��8�c��\��5��8��1kP���xxt<<:(�T>��T>��T>�(�TN��TN��TN�(���D�q"r5.��N�5=R�5����G�����"H�s.K�s.K�s.�bI�t.K�t.K�t.�b`��1N�'"W
_����.�C9d����A�"�������x�R��K�R��K�R��K�XR9�K�R9�K�R9�K�'z�=���kPuB�������@��9����9����9�@��r:���r:���r:�@10N�'z�����A=m�4;����j��(":��� ���D,���D,����%���D,���D,�����q��8�c��\�5P't<<:����;������U���P���2�����	�v��De$��(NH�b0�CGD#Q@P0"�#(��� 
4����������������{}?U��{�����r�����{T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���5�?����}��w��������#0xT����L��%���KP���K2�i�d��.A
���=1�'���^�~��_���|��_n�V���W��w���}��b���<�*P}vI����L��%�Eu�%���K2�i��������j������>�����{��o}�[o2�>U�A���eJv�q��~���?��O\�>/������?wl<�G`�@��.�T�]��>���N�$S�vI�:�����{"\��5����E_���5��+�������<�1�
ox�j���?������~�g�z���������{���N;�������C��+�Q�G`��@��%���K2�g���i�d��.�T�]��{b`O���_���#�h'�p����.L�O�kP_y���/xA[�l��-�7�����&���?|��������*�-���]t�E�m���#0xT����L��%���KP���K2�i�d��.A
���=1�'���^O�5���G���>z�K���G?��Gx���?�o�5����o���v��-���#0xT����L��%���KP���K2�i�d��.A
���=1�'���I?X����������������(y���7\s�����c�=���+����:k8��������#0xT����L��%���KP���K2�i�d��.A
���=1�'��LPO�(�mo{�S�M'o���g>����>C^��W����~�5�������/���M�������nAu��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f����������j�6V��&�?���C������oy�[�WV�kO�������v�UW��a�u��K�!�g�<����/e���}�8����_��?��q ��g���A��%�qP}vI~��Gu�%�qP�vI~T�]�2�aO��'�������O��L?�\f��z���{����^�����{�|�#)�&�kr�������n��sS����������[�l������&3��|�B���C��>y~����������p���J������O2�Q}vI~T�]��g�����~T�]��i���Au�%�q ���x��q`O\�d��� ���:���b�o$�n+��?�����_g�������e���g<����.>�������@��\{����. 3�C�|��K�!�g�<��=�/e���}�8�����);�������vT�]��g���A��%�q ��i���Au�%�qP�vI~���=1��x�W����3��af����!��v�m�v�e���ZS�Y�A������y��\o|���P��'?n��������WQ����a�#����_�r�
�qm�����C��.�T�]��>���N�$S�vI�:�����{"\���$���f_����8@��#����0f��'�8D?�IO�������_����_�����~���ko�������%?��������3��<�G`�@��.�T�]��>���N�$S�vI�:�����{"\���$~��_t�1bN�7I�������6�t����m��v�7N����o�������`����>�$S}vI�����:��Lu�%���KP{b`O��p5so��t�+�_����~��e^U�/�q�1��������R/�~���#0xT����L��%���KP���K2�i�d��.A
���=1�'�o����#0xT����L��%���KP���K2�i�d��.A
���=1�'��L�I������&����<�*P}vI����L��%�Eu�%���K2�i��������jf��~��,�u�{]{�;�!s�QG�x�D`!���<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@�����b~�@�*o�,��<�*P}vI����L��%�Eu�%���K2�i��������jf���7@�k��n2��$�<�G`�@��.�T�]��>���N�$S�vI�:�����{"\M���_��{��^[�j��-k��?�!��@	��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=��z@�_	�l�������]p�s�.����v���#�8�WT����<P���K2�g�d��.A-��.�T�]��N�5�'����WS���i���^�������o8p>�������~{��~���m���>���#0xT����L��%���KP���K2�i�d��.A
���=1�'��L\���K.i�|�;�&�l����7��M���o����G>������?���Wf�G`��@��%���K2�g���i�d��.�T�]��{b`O���y������}�3�����%Kn��?������_��,����<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@���~�0\��1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���j`C��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f�����Nj�~����y����k�i_����e�]6w����<P���K2�g�d��.A-��.�T�]��N�5�'����W3u@��?�����<�)m��%C�/_�.�����g?����^��.��7P�G`��@��%���K2�g���i�d��.�T�]��{b`O���:�>��c�=�y����?�������������������hw��]��?���j�����`����>�$S}vI�����:��Lu�%���KP{b`O��p53�+W�l;��S{�S��~���
��^�t�p@�]|��m�vh[m�U[�b�pP	�G`��@��%���K2�g���i�d��.�T�]��{b`O���9�^x ������p[�=����<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@�/������m���������u��~m�~�~_����<P���K2�g�d��.A-��.�T�]��N�5�'����W3u
�/|��>��O�m���[�����>�m������}�����Gy����Z<�G`�@��.�T�]��>���N�$S�vI�:�����{"\��ue��>�������m��%�����w�k�5P�G`��@��%���K2�g���i�d��.�T�]��{b`O���:��8��������Z�m@e��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f���1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O���:�^�re;��c�A�������a����>�$S}vI�����:��Lu�%���KP{b`O��p53��V�j�{l{�x�7H�Y�ti���K�>���#0xT����L��%���KP���K2�i�d��.A
���=1�'���P�X��m��V��}h�u�]��^�*�C=�]}���P�G`��@��%���K2�g���i�d��.�T�]��{b`O���9����+�s�����7��]��s����<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@���5�w�}�v�����
l8<�G`�@��.�T�]��>���N�$S�vI�:�����{"\���$��g?k�y�s�>��������|�s��MQ�G`��@��%���K2�g���i�d��.�T�]��{b`O���z��N8�=�iO�o�8	o���<�G`�@��.�T�]��>���N�$S�vI�:�����{"\���$>�!i���K{�+_)��~�7IDI��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�x�D`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f�M=�������&�� 1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D��������vX{�8R/[�L�_�o���<�G`�@��.�T�]��>���N�$S�vI�:�����{"\��u������sx�DT��<�*P}vI����L��%�Eu�%���K2�i��������jf��~�h����M���7IDE��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f����1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���u��t���)��2��C9�&�5�Q�G`��@��%���K2�g���i�d��.�T�]��{b`O���PO�;����s
jl�<�G`�@��.�T�]��>���N�$S�vI�:�����{"\M��zr����;n���x�+n2\�U1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D����H<�G`�@��.�T�]��>���N�$S�vI�:�����{"\��u������~�����rcg�}�p�A��<�G`�@��.�T�]��>���N�$S�vI�:�����{"\M�M�������v���=�A�5�y�����_�b�A��<�G`�@��.�T�]��>���N�$S�vI�:�����{"\M���|�+��!�)��\�1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���%>N9����.�����%m��7n[n�������T��<�*P}vI����L��%�Eu�%���K2�i��������jf�A�_}�!����;o�`���<�*P}vI����L��%�Eu�%���K2�i��������jf��
�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D����U�V��+W�}lx<�G`�@��.�T�]��>���N�$S�vI�:�����{"\����+�QG�=��!��s������<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@}��W�e���g?��m��%m����?�����=�����#0xT����L��%���KP���K2�i�d��.A
���=1�'���]�����k;��������m������~�px���>������n��@��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f�M���>�����;���^Y}�{�����on����Ja����>�$S}vI�����:��Lu�%���KP{b`O��p5��^x�p���g>�=�Aj;���p`���?���%/i��z�������<�*P}vI����L��%�Eu�%���K2�i��������j��W�\����/��{������g=�Y���������_?������pH��.���:���<P���K2�g�d��.A-��.�T�]��N�5�'����W3s@=9�~�{������j�~]j�X/]��]z��s����#0xT����L��%���KP���K2�i�d��.A
���=1�'���P�C�~��/�����|�WK���'������
��f�G`��@��%���K2�g���i�d��.�T�]��{b`O���9����+��������>z�`���<�*P}vI����L��%�Eu�%���K2�i��������jf�W�X���z�������#0xT����L��%���KP���K2�i�d��.A
���=1�'���P_u�U���}�s�=�WS���<P���K2�g�d��.A-��.�T�]��N�5�'����W3�&��rH�����������;L�+_��p_����<P���K2�g�d��.A-��.�T�]��N�5�'����W3�&�K�,�������0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���K|��O��e/�������0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���j`C��<�*P}vI����L��%�Eu�%���K2�i��������j&�O;����x���������v�	'���(��#0xT����L��%���KP���K2�i�d��.A
���=1�'��LP��c�9�=�Y���������kNo�����o}k������
���<�*P}vI����L��%�Eu�%���K2�i��������j���;������-���m��6�^���p@��9��~�����.���h����>���#0xT����L��%���KP���K2�i�d��.A
���=1�'���P��x����m��7.��_5�t�����������n;\�X�b�
���#0xT����L��%���KP���K2�i�d��.A
���=1�'���P/<�^�qw�������a����>�$S}vI�����:��Lu�%���KP{b`O��p53��2/z����^���:���;����g?{�O�/P
�G`��@��%���K2�g���i�d��.�T�]��{b`O������?�����{��w[�lY{�C2X~�����kP����7P�G`��@��%���K2�g���i�d��.�T�]��{b`O���:����~�{��6�l��d����o�o���E
T��<�*P}vI����L��%�Eu�%���K2�i��������j��'�=������d�����<�G`�@��.�T�]��>���N�$S�vI�:�����{"\��5�!b����>�$S}vI�����:��Lu�%���KP{b`O��p53��^{m;����G>���L�O�/P
�G`��@��%���K2�g���i�d��.�T�]��{b`O���9����K���Kot��������a����>�$S}vI�����:��Lu�%���KP{b`O��p53��7H�~��e�����{��^���T��<�*P}vI����L��%�Eu�%���K2�i����������5��9������}��������#0xT����L��%���KP���K2�i�d��.A
���=1�'�U�7I<����+^�
�A��<�G`�@��.�T�]��>���N�$S�vI�:�����{"\�9����o~���5�b����>�$S}vI�����:��Lu�%���KP{b`O��p53���k_�Z;��#d9�����<d����FE��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�f����*��:z��%k���n�>����}P�G`��@��%���K2�g���i�d��.�T�]��{b`O���9����~����^���1N�2���<P���K2�g�d��.A-��.�T�]��N�5�'����W��$���#0xT����L��%���KP���K2�i�d��.A
���=1�'��LP_|���c��j���������{�v����a����>�$S}vI�����:��Lu�%���KP{b`O��p5��
�����}���F�������v��������/}i;��?����#0xT����L��%���KP���K2�i�d��.A
���=1�'��������>�����>�����}�z����^���������/�2T?��Olg�y��gu0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D����g�1N?�Q�jGqD���?=Au��~���n:�������nt����<P���K2�g�d��.A-��.�T�]��N�5�'����WS=�����{���v����}�;�����.������<g8�����T��<�*P}vI����L��%�Eu�%���K2�i��������j��+V�h�l����*�3��L�d�M�WY�����a����>�$S}vI�����:��Lu�%���KP{b`O��p5��K/��-]��-_�|��_�������P
�G`��@��%���K2�g���i�d��.�T�]��{b`O���P_q���;s����@�����0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���u��G������v�����}����x�DT��<�*P}vI����L��%�Eu�%���K2�i��������j���	'���y�{�7�����3����O�+�O>�����;���*b����>�$S}vI�����:��Lu�%���KP{b`O��p5������������s�|�'>������|dx�t��~�~�"���<P���K2�g�d��.A-��.�T�]��N�5�'����WS?�����;�����'?�-Y��F�����p�1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D�����~��m���F���1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���j`C��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�F��<�*P}vI����L��%�Eu�%���K2�i���������j`$��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�^�����v��'����������~�O�m�>����?l7�p���P�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���~8}�����=�am��%�y�;���?���{��}�k��g�w��]s�5s�Bu��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�^O�<���p��������!Ox���x�����}8���+�>����>p�~�|�����c����>�$S}vI�����:��Lu�%���KP{b`O��p��z09t��}�����o����\�������>���7���p�UW]�^���w������p1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���z=��z��>0d�+��/_��������g�����Xy�����Nj_���,��c����>�$S}vI�����:��Lu�%���KP{b`O��p���������7������X�b���3�h���}�>������_������>{�6��#0x��g�d��.�T�]�ZT�]��N�$S�v	j`O���=�8�I%�g>���������n��$n��f���|���<�)�nw�[�y���C���j���r�J2���~/�RvH��'���P������O����7���C��>y�lG��%�qP}vI~T�]�2�Q�vI~T�]��i�����n��q`O���=q�~��K?�\8��e�]���?�s���k_���WOw���7�3����'�����n�+�O>���[�����?�)������_��?��q8�}�������p����������@f;��.�����K�������8�����K���:���8�N�$?dv���{b<������^������Yc�<px��]w�u�x1N9����F
���9�\sM;��s����/\$�RvH��'����1y~����);�������vT�]��g���A��%�q ��i���Au�%�qP�vI~���=1��x�W����3������_C��b�x�|��n���~��C�������9���[P�\[,pm1T����L��%���KP���K2�i�d��.A
���=1�'���I���6�l3����>���[W����{�������7wk����>��v�������?���1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���z=���>��#�7?������x���������a�����t��;�������1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���z=����{����5�����p�~��7�a�Fu��G<���q"�c����>�$S}vI�����:��Lu�%���KP{b`O��p��z��q��vk[l��s��g��;.��������
�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+���0x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���	�G`��@��%���K2�g���i�d��.�T�]��{b`O�+�g���_��;����\�r�w��`����>�$S}vI�����:��Lu�%���KP{b`O��p������~����%K�s�QGqH��a����>�$S}vI�����:��Lu�%���KP{b`O��p���]��m���jm�Q��}���w����7������n_��W�����{�:���<P���K2�g�d��.A-��.�T�]��N�5�'����WPO�����6�d����{��b��?�i�|���v�m�V�X1���<�G`�@��.�T�]��>���N�$S�vI�:�����{"\q@=eGqD�l������;wK���?=\���?���1x����
T�]��>�$S}v	jQ�vI�:��Lu�%��=1�'�D���z����7�IDAT;��o����?�]v�es����:k8�����5w�c����>�$S}vI�����:��Lu�%���KP{b`O��p��]s�5m�w��M��/_>w�c����>�$S}vI�����:��Lu�%���KP{b`O��p�����������_���x�}������k�e�����p�!�ig<��Y����g�<���W���m�N�k�q ��g���A��%�qP}vI~��Gu�%�qP�vI~T�]�2�aO���=1����Af/�r}��z���"���.7y@}��G���f��������2�y��,ow���ow��'=F�N!�B�	{b�=1��X#�r}��z�8����v��+V��N?������O��e��8����O~�B!�B!�B�K�������WH?�A����?����t�Am��6k��s�plh8�������iw����c��v�i����������t[�dI{����.����{�����l�����JOr�{�c8��
�3���n���}���4�
�����0P��j�Tp@
`����k�e�]����?��h_|q���+���u\}����/l�_��-�5��3k?7���3\���L���5���������;��������K/��-]���������x�:�/_�6�d�v�Yg��2]c}?��s��+V�����}��������j�]?���zu<@Ps@
������j�>p@
`�������K�k_����6��s�9g����pv��U�?!���~5D����������/�M��+��r���n��6�Z�����������L������W��>����o�5PD>����]��E���9e�����sE~>Z�����Y�s�����o�\�o�����k,��?���|��o}�=��n�s����n�9fm����������Y
�����z39�^�d�|�+o&����~���O�����_\�p'/E�����>��nM�F^x{���g��6�|�������������>�|������/e�ad�������.�~������?��3��39�=��S���.�k�O��@w�����AO~>Z�����O~������_�8��}�'>���+_���uou�[���>z��9�=������sx��1�9d�������<XW����~=��j�M�/��/��6����v�k�����)��2������g<����������.��p�g����;l8l��=����� ��_�b��b�/������z�����u�L�3��w���wr��9}�����k����3����������������=�y��>�w�a��{���}�%f��v�.�=�����ycr��P�������~������1�z���N;�t�������=mx�z�����;��v��3�9x������[n]�?��~����<�������m����~�Y�s�+��4y�9��C�_D�����>�XwPX��!��u�����n��+�����nr��J�a��g?���s'��b����� ���}n��x���s���M�'�Wb���?���{�?C�~&�g�y�p����>w��]���?����8�f�d���G���������@>������z��B�z����e��9�?W�%I}��w�}x����~��7z����_�_q��������?����5���-�9���y�2������5�o��O��&/"Z�,���9��j����w�e�����_�z��6k��G����q��o�?���|���0�z7w@���0�0�l��v�������������[xy�������~�N<�������8�f__bT�8��a9������[���}������X��������|~��*��{��sZ~^���6_���-�9�E�5����'��������?uk3�-�y�����z����;P�K��G�P���?g�_���WFO.�1��^��a������b�K_��p{Z&���O['_k�f�>�����0{������9�wS���z����u�?om����@>�P/�9�[�s	�1��&�j��<g�����<�XwPX���pK������~4|<��?������s�e�
�^��L���D�/����/�s�u����z��������~�v�UW
���C���~���'=i���O.�1�Z��>��O
��??[��`�,f������/*��������-��M>�_�c�|�0�2k����������z��1�b�C`]L������8����;�b����y�����z�a_����L�[��n��X���m��>���7����0y������029d��P�/���������_�c�������~��~�
u(�_U=�Zk�3�����{0�{���?���7������?�������������2��������&���;y�����k���������������>�t�}�u1y^�����/�n�������5�QG5��6��b���;��w��
����%����~����[�}!�C@�!�2�z����M��|��g�����'_�3����q������C���{r�����������-9�V?�������L�?�s������������o�_o��uK������YO���������n����?���������n��}�Y�s�+5��������O�4��c�v���9��j�]?H���W.�p���lj�5��0�/�10�~����&����c�=ntX���`2������<����>�{������~�~
���N���������I}>����%f��G���=�qm�}�Y�~k����j��9c������/��M>������?��w�C�s�=������=�1�����W����N~���5���g����k�9f��y������w������,vs��u�5���/5�@�_���/����z�!�[������s'Y�0�?or�5�9������g?;��!�=�����o�?����u����_O�������X�n����[x���z���?gL�?�s�b�^���{���b���sG~���y�Y��<XW�y��5��</-v��n8�L�����0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
�
�S�5�����?�����]w�us�,N������C����V�\�.�����j����];����7��ut�����K��������8�\sM�q���_�9�p�
��+�l{��w[�d�j9��������>�O���XG��������Cn���B�����vZ{�C�n{���;���������nu�[
�k�Jj���p@
��5P��~���%6���%2^:#|����vy��\>��/l�����nw���w�s��p�e��-�����_�E;��n�}������:��N��gr���+������/E�����%�j��XG����J��?�i{��0����_�����p�������~�v�1��vy��;�&_��t?��������lGy����;��s������W�z�0x]����o�<.�������:��������w���}��g�����>O��-y,P��:Zx@������a��o���=�������O���@w�q������Z���m��������W��e/kO���WK/��~���'�>y\^��W��?��������n}�[��o|��������|��uK��5��P_}��m�vVO>�����������g������*�<p��Z;��������[nl�=��}Z��s�����'�S?�>������WM�����!���������N:i��������o�c�:8�������:�m��&��>��m���}�N>~�S��������m��[��P���9�xb����i��w^����q�������7w��?w��-y,P��:Z��w�������������v��������������������C��8M�����]����n@MPk!�N����mO~��ot@����o[��Kg�78�����n]]?�>��#���n��>��[�}�?��*�_��_�zbr���8�����X���'�7���~4wKk?�������u��sN�l���7��������~[��~����k:|]��y�����;��v���������~���{�-�^q}K���:��u�%�����M��5n��Z��I����v��W���?<������x�+�7<�������o�������u��k�}�~���x�_�9=?���������6�oX���-�>�~���y_|q{��_�nu�[���j���������g���������o���Nj�������|�;����u[�t��������G<��������;��nw�v����3������w���v�s����v�����j�~P��p@
��~���[��6�hH�u>�����~���U�z�����O���5����N29p^hm����J���������7y�����^�s'�k�3�t;j��0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
�
�S�5`*8�L�����0P��j�Tp@
����?v ����IEND�B`�

#53

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#52)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 2 Nov 2020 at 20:43, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:
I did some further tests this time with some tuple deforming. Again,
it does seem that v9 is slower than v8.

Graphs attached

Looking at profiles, I don't really see any obvious reason as to why
this is. I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.

Nobody raised any objections, so I'll start taking a more serious look
at the v8 version (the patch with the separate Result Cache node).

One thing that I had planned to come back to about now is the name
"Result Cache". I admit to not thinking for too long on the best name
and always thought it was something to come back to later when there's
some actual code to debate a better name for. "Result Cache" was
always a bit of a placeholder name.

Some other names that I'd thought of were:

"MRU Hash"
"MRU Cache"
"Parameterized Tuple Cache" (bit long)
"Parameterized Cache"
"Parameterized MRU Cache"

I know Robert had shown some interest in using a different name. It
would be nice to settle on something most people are happy with soon.

David

#54

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#53)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, Nov 6, 2020 at 6:13 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 2 Nov 2020 at 20:43, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:
I did some further tests this time with some tuple deforming. Again,
it does seem that v9 is slower than v8.

Graphs attached

Looking at profiles, I don't really see any obvious reason as to why
this is. I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.

Nobody raised any objections, so I'll start taking a more serious look
at the v8 version (the patch with the separate Result Cache node).

One thing that I had planned to come back to about now is the name
"Result Cache". I admit to not thinking for too long on the best name
and always thought it was something to come back to later when there's
some actual code to debate a better name for. "Result Cache" was
always a bit of a placeholder name.

Some other names that I'd thought of were:

"MRU Hash"
"MRU Cache"
"Parameterized Tuple Cache" (bit long)
"Parameterized Cache"
"Parameterized MRU Cache"

I think "Tuple Cache" would be OK which means it is a cache for tuples.
Telling MRU/LRU would be too internal for an end user and "Parameterized"
looks redundant given that we have said "Cache Key" just below the node
name.

Just my $0.01.

--
Best Regards
Andy Fan

#55

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#52)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:

So far benchmarking shows there's still a regression from the v8
version of the patch. This is using count(*). An earlier test [1] did
show speedups when we needed to deform tuples returned by the nested
loop node. I've not yet repeated that test again. I was disappointed
to see v9 slower than v8 after having spent about 3 days rewriting the
patch

I did some further tests this time with some tuple deforming. Again,
it does seem that v9 is slower than v8.

I run your test case on v8 and v9, I can produce a stable difference
between them.

v8:
statement latencies in milliseconds:
1603.611 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

v9:
statement latencies in milliseconds:
1772.287 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

then I did a perf on the 2 version, Is it possible that you
called tts_minimal_clear twice in
the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple
called tts_minimal_clear
on the same slot.

With the following changes:

diff --git a/src/backend/executor/execMRUTupleCache.c
b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache,
MRUCacheKey *key)
        TupleTableSlot *tslot = mrucache->tableslot;
        int                             numKeys = mrucache->nkeys;

- ExecClearTuple(pslot);
-
if (key == NULL)
{
+ ExecClearTuple(pslot);
/* Set the probeslot's values based on the current
parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] =
ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state =
MRUCACHE_FETCH_NEXT_TUPLE;

-
ExecClearTuple(mrucache->cachefoundslot);
+                                               //
ExecClearTuple(mrucache->cachefoundslot);
                                                slot =
mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}

-                               ExecClearTuple(mrucache->cachefoundslot);
+                               // ExecClearTuple(mrucache->cachefoundslot);
                                slot = mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;

v9 has the following result:
1608.048 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

Graphs attached

Looking at profiles, I don't really see any obvious reason as to why
this is. I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.

David

--
Best Regards
Andy Fan

#56

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#55)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 9 Nov 2020 at 03:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:

then I did a perf on the 2 version, Is it possible that you called tts_minimal_clear twice in
the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called tts_minimal_clear
on the same slot.

With the following changes:

Thanks for finding that. After applying that fix I did a fresh set of
benchmarks on the latest master, latest master + v8 and latest master
+ v9 using the attached script. (resultcachebench2.sh.txt)

I ran this on my zen2 AMD64 machine and formatted the results into the
attached resultcache_master_vs_v8_vs_v9.csv file

If I load this into PostgreSQL:

# create table resultcache_bench (tbl text, target text, col text,
latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9
numeric(10,3));
# copy resultcache_bench from
'/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv);

and run:

# select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9,
round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from
resultcache_bench group by 1,2,3 order by 2,1,3;

I've attached the results of the above query. (resultcache_v8_vs_v9.txt)

Out of the 24 tests done on each branch, only 6 of 24 are better on v9
compared to v8. So v8 wins on 75% of the tests. v9 never wins using
the lookup1 table (1 row per lookup). It only runs on 50% of the
lookup100 queries (100 inner rows per outer row). However, despite the
draw in won tests for the lookup100 test, v8 takes less time overall,
as indicated by the following query:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench where tbl='lookup100';
v8_vs_v9
----------
99.3
(1 row)

Ditching the WHERE clause and simply doing:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench;
v8_vs_v9
----------
96.2
(1 row)

indicates that v8 is 3.8% faster than v9. Altering that query
accordingly indicates v8 is 11.5% faster than master and v9 is only 7%
faster than master.

Of course, scaling up the test will yield both versions being even
more favourable then master, but the point here is comparing v8 to v9.

David

#57

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#56)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, Nov 9, 2020 at 10:07 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 9 Nov 2020 at 03:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:

then I did a perf on the 2 version, Is it possible that you called

tts_minimal_clear twice in

the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called

tts_minimal_clear

on the same slot.

With the following changes:

Thanks for finding that. After applying that fix I did a fresh set of
benchmarks on the latest master, latest master + v8 and latest master
+ v9 using the attached script. (resultcachebench2.sh.txt)

I ran this on my zen2 AMD64 machine and formatted the results into the
attached resultcache_master_vs_v8_vs_v9.csv file

If I load this into PostgreSQL:

# create table resultcache_bench (tbl text, target text, col text,
latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9
numeric(10,3));
# copy resultcache_bench from
'/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv);

and run:

# select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9,
round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from
resultcache_bench group by 1,2,3 order by 2,1,3;

I've attached the results of the above query. (resultcache_v8_vs_v9.txt)

Out of the 24 tests done on each branch, only 6 of 24 are better on v9
compared to v8. So v8 wins on 75% of the tests.

I think either version is OK for me and I like this patch overall. However
I believe v9
should be no worse than v8 all the time, Is there any theory to explain
your result?

v9 never wins using

the lookup1 table (1 row per lookup). It only runs on 50% of the
lookup100 queries (100 inner rows per outer row). However, despite the
draw in won tests for the lookup100 test, v8 takes less time overall,
as indicated by the following query:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench where tbl='lookup100';
v8_vs_v9
----------
99.3
(1 row)

Ditching the WHERE clause and simply doing:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench;
v8_vs_v9
----------
96.2
(1 row)

indicates that v8 is 3.8% faster than v9. Altering that query
accordingly indicates v8 is 11.5% faster than master and v9 is only 7%
faster than master.

Of course, scaling up the test will yield both versions being even
more favourable then master, but the point here is comparing v8 to v9.

David

--
Best Regards
Andy Fan

#58

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#57)

2 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 9 Nov 2020 at 16:29, Andy Fan <zhihui.fan1213@gmail.com> wrote:

I think either version is OK for me and I like this patch overall.

That's good to know. Thanks.

However I believe v9
should be no worse than v8 all the time, Is there any theory to explain
your result?

Nothing jumps out at me from looking at profiles. The only thing I
noticed was the tuple deforming is more costly with v9. I'm not sure
why.

The other part of v9 that I don't have a good solution for yet is the
code around the swapping of the projection info for the Nested Loop.
The cache always uses a MinimalTupleSlot, but we may have a
VirtualSlot when we get a cache miss. If we do then we need to
initialise 2 different projection infos so when we project from the
cache that we have the step to deform the minimal tuple. That step is
not required when the inner slot is a virtual slot.

I did some further testing on performance. Basically, I increased the
size of the tests by 2 orders of magnitude. Instead of 100k rows, I
used 10million rows. (See attached
resultcache_master_vs_v8_vs_v9_big.csv)

Loading that in with:

# create table resultcache_bench2 (tbl text, target text, col text,
latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9
numeric(10,3));
# copy resultcache_bench2 from
'/path/to/resultcache_master_vs_v8_vs_v9_big.csv' with(format csv);

I see that v8 still wins.

postgres=# select round(avg(latency_v8/latency_master)*100,1) as
v8_vs_master, round(avg(latency_v9/latency_master)*100,1) as
v9_vs_master, round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from
resultcache_bench2;
v8_vs_master | v9_vs_master | v8_vs_v9
--------------+--------------+----------
56.7 | 58.8 | 97.3

Execution for all tests for v8 runs in 56.7% of master, but v9 runs in
58.8% of master's time. Full results in
resultcache_master_v8_vs_v9_big.txt. v9 wins in 7 of 24 tests this
time. The best example test for v8 shows that v8 takes 90.6% of the
time of v9, but in the tests where v9 is faster, it only has a 4.3%
lead on v8 (95.7%). You can see that overall v8 is 2.7% faster than v9
for these tests.

David

#59

Alvaro Herrera

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: David Rowley (#58)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On 2020-Nov-10, David Rowley wrote:

On Mon, 9 Nov 2020 at 16:29, Andy Fan <zhihui.fan1213@gmail.com> wrote:

However I believe v9
should be no worse than v8 all the time, Is there any theory to explain
your result?

Nothing jumps out at me from looking at profiles. The only thing I
noticed was the tuple deforming is more costly with v9. I'm not sure
why.

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to this
kind of effect.

I don't know what is a good technique to test this hypothesis.

#60

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#59)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to this
kind of effect.

Yeah. I believe what this arises from is good or bad luck about relevant
tight loops falling within or across cache lines, and that sort of thing.
We've definitely seen performance changes up to a couple percent with
no apparent change to the relevant code.

regards, tom lane

#61

https://github.com/ccurtsinger/stabilizer

dgrowleyml@gmail.com

about 5 years ago

In reply to: Tom Lane (#60)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 10 Nov 2020 at 12:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to this
kind of effect.

Yeah. I believe what this arises from is good or bad luck about relevant
tight loops falling within or across cache lines, and that sort of thing.
We've definitely seen performance changes up to a couple percent with
no apparent change to the relevant code.

It possibly is this issue.

Normally how I build up my confidence in which is faster is why just
rebasing on master as it advances and see if the winner ever changes.
The theory here is if one patch is consistently the fastest, then
there's more chance if there being a genuine reason for it.

So far I've only rebased v9 twice. Both times it was slower than v8.
Since the benchmarks are all scripted, it's simple enough to kick off
another round to see if anything has changed.

I do happen to prefer having the separate Result Cache node (v8), so
from my point of view, even if the performance was equal, I'd rather
have v8. I understand that some others feel different though.

David

#62

Peter Geoghegan

pg@bowt.ie

about 5 years ago

In reply to: Tom Lane (#60)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, Nov 9, 2020 at 3:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to this
kind of effect.

Yeah. I believe what this arises from is good or bad luck about relevant
tight loops falling within or across cache lines, and that sort of thing.
We've definitely seen performance changes up to a couple percent with
no apparent change to the relevant code.

That was Andrew Gierth. And it was 5% IIRC.

In theory it should be possible to control for this using a tool like
stabilizer:

I am not aware of anybody having actually used the tool with Postgres,
though. It looks rather inconvenient.

--
Peter Geoghegan

#63

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#61)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Nov 10, 2020 at 7:55 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 10 Nov 2020 at 12:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to

this

kind of effect.

Yeah. I believe what this arises from is good or bad luck about relevant
tight loops falling within or across cache lines, and that sort of thing.
We've definitely seen performance changes up to a couple percent with
no apparent change to the relevant code.

I do happen to prefer having the separate Result Cache node (v8), so
from my point of view, even if the performance was equal, I'd rather
have v8. I understand that some others feel different though.

While I have interest about what caused the tiny difference, I admit that
what direction
this patch should go is more important. Not sure if anyone is convinced
that
v8 and v9 have a similar performance. The current data show it is similar.
I want to
profile/read code more, but I don't know what part I should pay attention
to. So I think
any hints on why v9 should be better at a noticeable level in theory
should be very
helpful. After that, I'd like to read the code or profile more carefully.

--
Best Regards
Andy Fan

#64

[1]: /messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#63)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 10 Nov 2020 at 15:38, Andy Fan <zhihui.fan1213@gmail.com> wrote:

While I have interest about what caused the tiny difference, I admit that what direction
this patch should go is more important. Not sure if anyone is convinced that
v8 and v9 have a similar performance. The current data show it is similar. I want to
profile/read code more, but I don't know what part I should pay attention to. So I think
any hints on why v9 should be better at a noticeable level in theory should be very
helpful. After that, I'd like to read the code or profile more carefully.

It was thought by putting the cache code directly inside
nodeNestloop.c that the overhead of fetching a tuple from a subnode
could be eliminated when we get a cache hit.

A cache hit on v8 looks like:

Nest loop -> Fetch new outer row
Nest loop -> Fetch inner row
Result Cache -> cache hit return first cached tuple
Nest loop -> eval qual and return tuple if matches

With v9 it's more like:

Nest Loop -> Fetch new outer row
Nest loop -> cache hit return first cached tuple
Nest loop -> eval qual and return tuple if matches

So 1 less hop between nodes.

In reality, the hop is not that expensive, so might not be a big
enough factor to slow the execution down.

There's some extra complexity in v9 around the slot type of the inner
tuple. A cache hit means the slot type is Minimal. But a miss means
the slot type is whatever type the inner node's slot is. So some code
exists to switch the qual and projection info around depending on if
we get a cache hit or a miss.

I did some calculations on how costly pulling a tuple through a node in [1]/messages/by-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com.

David

#65

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#61)

3 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 10 Nov 2020 at 12:55, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 10 Nov 2020 at 12:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Are you taking into account the possibility that generated machine code
is a small percent slower out of mere bad luck? I remember someone
suggesting that they can make code 2% faster or so by inserting random
no-op instructions in the binary, or something like that. So if the
difference between v8 and v9 is that small, then it might be due to this
kind of effect.

Yeah. I believe what this arises from is good or bad luck about relevant
tight loops falling within or across cache lines, and that sort of thing.
We've definitely seen performance changes up to a couple percent with
no apparent change to the relevant code.

It possibly is this issue.

Normally how I build up my confidence in which is faster is why just
rebasing on master as it advances and see if the winner ever changes.
The theory here is if one patch is consistently the fastest, then
there's more chance if there being a genuine reason for it.

I kicked off a script last night that ran benchmarks on master, v8 and
v9 of the patch on 1 commit per day for the past 30 days since
yesterday. The idea here is that as the code changes that if the
performance differences are due to code alignment then there should be
enough churn in 30 days to show if this is the case.

The quickly put together script is attached. It would need quite a bit
of modification to run on someone else's machine.

This took about 20 hours to run. I found that v8 is faster on 28 out
of 30 commits. In the two cases where v9 was faster, v9 took 99.8% and
98.5% of the time of v8. In the 28 cases where v8 was faster it was
generally about 2-4% faster, but a couple of times 8-10% faster. Full
results attached in .csv file. Also the query I ran to compare the
results once loaded into Postgres.

David

#66

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#65)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi David:

I did a review on the v8, it looks great to me. Here are some tiny things
noted,
just FYI.

 1. modified   src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
  * callers to provide further details about some assumptions which were
made
  * during the estimation.
  */
-#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one of
-  * the DEFAULTs as defined above.
-  */
+#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one
+ * of the DEFAULTs as defined
+ * above. */

Looks nothing has changed.

2. leading spaces is not necessary in comments.

/*
* ResultCacheTuple Stores an individually cached tuple
*/
typedef struct ResultCacheTuple
{
MinimalTuple mintuple; /* Cached tuple */
struct ResultCacheTuple *next; /* The next tuple with the same parameter
* values or NULL if it's the last one */
} ResultCacheTuple;

3. We define ResultCacheKey as below.

/*
* ResultCacheKey
* The hash table key for cached entries plus the LRU list link
*/
typedef struct ResultCacheKey
{
MinimalTuple params;
dlist_node lru_node; /* Pointer to next/prev key in LRU list */
} ResultCacheKey;

Since we store it as a MinimalTuple, we need some FETCH_INNER_VAR step for
each element during the ResultCacheHash_equal call. I am thinking if we can
store a "Datum *" directly to save these steps.
exec_aggvalues/exec_aggnulls looks
a good candidate for me, except that the name looks not good. IMO, we can
rename exec_aggvalues/exec_aggnulls and try to merge
EEOP_AGGREF/EEOP_WINDOW_FUNC into a more generic step which can be
reused in this case.

4. I think the ExecClearTuple in prepare_probe_slot is not a must, since
the
data tts_values/tts_flags/tts_nvalid are all reset later, and tts_tid is not
real used in our case. Since both prepare_probe_slot
and ResultCacheHash_equal are in pretty hot path, we may need to consider
it.

static inline void
prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
{
...
ExecClearTuple(pslot);
...
}

static void
tts_virtual_clear(TupleTableSlot *slot)
{
if (unlikely(TTS_SHOULDFREE(slot)))
{
VirtualTupleTableSlot *vslot = (VirtualTupleTableSlot *) slot;

pfree(vslot->data);
vslot->data = NULL;

slot->tts_flags &= ~TTS_FLAG_SHOULDFREE;
}

slot->tts_nvalid = 0;
slot->tts_flags |= TTS_FLAG_EMPTY;
ItemPointerSetInvalid(&slot->tts_tid);
}

--
Best Regards
Andy Fan

#67

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Andy Fan (#66)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Sun, Nov 22, 2020 at 9:21 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi David:

I did a review on the v8, it looks great to me. Here are some tiny
things noted,
just FYI.
1. modified   src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
* callers to provide further details about some assumptions which were
made
* during the estimation.
*/
-#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one of
-  * the DEFAULTs as defined above.
-  */
+#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one
+ * of the DEFAULTs as defined
+ * above. */
Looks nothing has changed.

2. leading spaces is not necessary in comments.

/*
* ResultCacheTuple Stores an individually cached tuple
*/
typedef struct ResultCacheTuple
{
MinimalTuple mintuple; /* Cached tuple */
struct ResultCacheTuple *next; /* The next tuple with the same parameter
* values or NULL if it's the last one */
} ResultCacheTuple;

3. We define ResultCacheKey as below.

/*
* ResultCacheKey
* The hash table key for cached entries plus the LRU list link
*/
typedef struct ResultCacheKey
{
MinimalTuple params;
dlist_node lru_node; /* Pointer to next/prev key in LRU list */
} ResultCacheKey;

Since we store it as a MinimalTuple, we need some FETCH_INNER_VAR step for
each element during the ResultCacheHash_equal call. I am thinking if we
can
store a "Datum *" directly to save these steps.
exec_aggvalues/exec_aggnulls looks
a good candidate for me, except that the name looks not good. IMO, we can
rename exec_aggvalues/exec_aggnulls and try to merge
EEOP_AGGREF/EEOP_WINDOW_FUNC into a more generic step which can be
reused in this case.

4. I think the ExecClearTuple in prepare_probe_slot is not a must, since
the
data tts_values/tts_flags/tts_nvalid are all reset later, and tts_tid is
not
real used in our case. Since both prepare_probe_slot
and ResultCacheHash_equal are in pretty hot path, we may need to consider
it.

static inline void
prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
{
...
ExecClearTuple(pslot);
...
}

static void
tts_virtual_clear(TupleTableSlot *slot)
{
if (unlikely(TTS_SHOULDFREE(slot)))
{
VirtualTupleTableSlot *vslot = (VirtualTupleTableSlot *) slot;

pfree(vslot->data);
vslot->data = NULL;

slot->tts_flags &= ~TTS_FLAG_SHOULDFREE;
}

slot->tts_nvalid = 0;
slot->tts_flags |= TTS_FLAG_EMPTY;
ItemPointerSetInvalid(&slot->tts_tid);
}

--
Best Regards
Andy Fan

add 2 more comments.

1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the
error clearer.

case RC_END_OF_SCAN:
/*
* We've already returned NULL for this scan, but just in case
* something call us again by mistake.
*/
return NULL;

2. Currently we handle the (!cache_store_tuple(node, outerslot))) case by
set it
to RC_CACHE_BYPASS_MODE. The only reason for the cache_store_tuple
failure is
we can't cache_reduce_memory. I guess if cache_reduce_memory
failed once, it would not succeed later(no more tuples can be stored,
nothing is changed). So I think we can record this state and avoid any
new
cache_reduce_memory call.

/*
* If we failed to create the entry or failed to store the
* tuple in the entry, then go into bypass mode.
*/
if (unlikely(entry == NULL ||
!cache_store_tuple(node, outerslot)))

if (unlikely(entry == NULL || node->memory_cant_be_reduced ||
!cache_store_tuple(node, outerslot)))

--
Best Regards
Andy Fan

#68

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#67)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for having another look at this.

On Sun, Nov 22, 2020 at 9:21 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
add 2 more comments.

1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the error clearer.

case RC_END_OF_SCAN:
/*
* We've already returned NULL for this scan, but just in case
* something call us again by mistake.
*/
return NULL;

This just took some inspiration from nodeMaterial.c where it says:

/*
* If necessary, try to fetch another row from the subplan.
*
* Note: the eof_underlying state variable exists to short-circuit further
* subplan calls. It's not optional, unfortunately, because some plan
* node types are not robust about being called again when they've already
* returned NULL.
*/

I'm not feeling a pressing need to put an Assert(false); in there as
it's not what nodeMaterial.c does. nodeMaterial is nodeResultCache's
sister node which can also be seen below Nested Loops.

2. Currently we handle the (!cache_store_tuple(node, outerslot))) case by set it
to RC_CACHE_BYPASS_MODE. The only reason for the cache_store_tuple failure is
we can't cache_reduce_memory. I guess if cache_reduce_memory
failed once, it would not succeed later(no more tuples can be stored,
nothing is changed). So I think we can record this state and avoid any new
cache_reduce_memory call.

/*
* If we failed to create the entry or failed to store the
* tuple in the entry, then go into bypass mode.
*/
if (unlikely(entry == NULL ||
!cache_store_tuple(node, outerslot)))

to

if (unlikely(entry == NULL || node->memory_cant_be_reduced ||
!cache_store_tuple(node, outerslot)))

The reason for RC_CACHE_BYPASS_MODE is if there's a single set of
parameters that have so many results that they, alone, don't fit in
the cache. We call cache_reduce_memory() whenever we go over our
memory budget. That function returns false if it was unable to free
enough memory without removing the "specialkey", which in this case is
the current cache entry that's being populated. Later, when we're
caching some entry that isn't quite so large, we still want to be able
to cache that. In that case, we'll have removed the remnants of the
overly large entry that didn't fit to way for newer and, hopefully,
smaller entries. No problems. I'm not sure why there's a need for
another flag here.

A bit more background.

When caching a new entry, or finding an existing entry, we move that
entry to the top of the MRU dlist. When adding entries or tuples to
existing entries, if we've gone over memory budget, then we remove
cache entries from the MRU list starting at the tail (lease recently
used). If we begin caching tuples for an entry and need to free some
space, then since we've put the current entry to the top of the MRU
list, it'll be the last one to be removed. However, it's still
possible that we run through the entire MRU list and end up at the
most recently used item. So the entry we're populating can also be
removed if freeing everything else was still not enough to give us
enough free memory. The code refers to this as a cache overflow. This
causes the state machine to move into RC_CACHE_BYPASS_MODE mode. We'll
just read tuples directly from the subnode in that case, no need to
attempt to cache them. They're not going to fit. We'll come out of
RC_CACHE_BYPASS_MODE when doing the next rescan with a different set
of parameters. This is our chance to try caching things again. The
code does that. There might be far fewer tuples for the next parameter
we're scanning for, or those tuples might be more narrow. So it makes
sense to give caching them another try. Perhaps there's some point
where we should give up doing that, but given good statistics, it's
unlikely the planner would have thought a result cache would have been
worth the trouble and would likely have picked some other way to
execute the plan. The planner does estimate the average size of a
cache entry and calculates how many of those fit into a hash_mem. If
that number is too low then Result Caching the inner side won't be too
appealing. Of course, calculating the average does not mean there are
no outliers. We'll deal with the large side of the outliers with the
bypass code.

I currently don't really see what needs to be changed about that.

David

#69

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#68)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, Nov 27, 2020 at 8:10 AM David Rowley <dgrowleyml@gmail.com> wrote:

Thanks for having another look at this.

On Sun, Nov 22, 2020 at 9:21 PM Andy Fan <zhihui.fan1213@gmail.com>

wrote:

add 2 more comments.

1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the

error clearer.

case RC_END_OF_SCAN:
/*
* We've already returned NULL for this scan, but just in case
* something call us again by mistake.
*/
return NULL;

This just took some inspiration from nodeMaterial.c where it says:

/*
* If necessary, try to fetch another row from the subplan.
*
* Note: the eof_underlying state variable exists to short-circuit further
* subplan calls. It's not optional, unfortunately, because some plan
* node types are not robust about being called again when they've already
* returned NULL.
*/

I'm not feeling a pressing need to put an Assert(false); in there as
it's not what nodeMaterial.c does. nodeMaterial is nodeResultCache's
sister node which can also be seen below Nested Loops.

OK, even though I am not quite understanding the above now, I will try to
figure it
by myself. I'm OK with this decision.

2. Currently we handle the (!cache_store_tuple(node, outerslot))) case

by set it

to RC_CACHE_BYPASS_MODE. The only reason for the cache_store_tuple

failure is

we can't cache_reduce_memory. I guess if cache_reduce_memory
failed once, it would not succeed later(no more tuples can be stored,
nothing is changed). So I think we can record this state and avoid

any new

cache_reduce_memory call.

/*
* If we failed to create the entry or failed to store the
* tuple in the entry, then go into bypass mode.
*/
if (unlikely(entry == NULL ||
!cache_store_tuple(node, outerslot)))

to

if (unlikely(entry == NULL || node->memory_cant_be_reduced ||
!cache_store_tuple(node, outerslot)))

The reason for RC_CACHE_BYPASS_MODE is if there's a single set of
parameters that have so many results that they, alone, don't fit in
the cache. We call cache_reduce_memory() whenever we go over our
memory budget. That function returns false if it was unable to free
enough memory without removing the "specialkey", which in this case is
the current cache entry that's being populated. Later, when we're
caching some entry that isn't quite so large, we still want to be able
to cache that. In that case, we'll have removed the remnants of the
overly large entry that didn't fit to way for newer and, hopefully,
smaller entries. No problems. I'm not sure why there's a need for
another flag here.

Thanks for the explanation, I'm sure I made some mistakes before at
this part.

--
Best Regards
Andy Fan

#70

[1]: /messages/by-id/CAApHDvoj_sH1H3JVXgHuwnxf1FQbjRVOqqgxzOgJX13NiA9-cg@mail.gmail.com

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#65)

4 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 12 Nov 2020 at 15:36, David Rowley <dgrowleyml@gmail.com> wrote:

I kicked off a script last night that ran benchmarks on master, v8 and
v9 of the patch on 1 commit per day for the past 30 days since
yesterday. The idea here is that as the code changes that if the
performance differences are due to code alignment then there should be
enough churn in 30 days to show if this is the case.

The quickly put together script is attached. It would need quite a bit
of modification to run on someone else's machine.

This took about 20 hours to run. I found that v8 is faster on 28 out
of 30 commits. In the two cases where v9 was faster, v9 took 99.8% and
98.5% of the time of v8. In the 28 cases where v8 was faster it was
generally about 2-4% faster, but a couple of times 8-10% faster. Full
results attached in .csv file. Also the query I ran to compare the
results once loaded into Postgres.

Since running those benchmarks, Andres spent a little bit of time
looking at the v9 patch and he pointed out that I can use the same
projection info in the nested loop code with and without a cache hit.
I just need to ensure that inneropsfixed is false so that the
expression compilation includes a deform step when result caching is
enabled. Making it work like that did make a small performance
improvement, but further benchmarking showed that it was still not as
fast as the v8 patch (separate Result Cache node).

Due to that, I want to push forward with having the separate Result
Cache node and just drop the idea of including the feature as part of
the Nested Loop node.

I've attached an updated patch, v10. This is v8 with a few further
changes; I added the peak memory tracking and adjusted a few comments.
I added a paragraph to explain what RC_CACHE_BYPASS_MODE is. I also
noticed that the code I'd written to build the cache lookup expression
included a step to deform the outer tuple. This was unnecessary and
slowed down the expression evaluation.

I'm fairly happy with patches 0001 to 0003. However, I ended up
stripping out the subplan caching code out of 0003 and putting it in
0004. This part I'm not so happy with. The problem there is that when
planning a correlated subquery we don't have any context to determine
how many distinct values the subplan will be called with. For now, the
0004 patch just always includes a Result Cache for correlated
subqueries. The reason I don't like that is that it could slow things
down when the cache never gets a hit. The additional cost of adding
tuples to the cache is going to slow things down.

I'm not yet sure the best way to make 0004 better. I don't think using
AlternativeSubplans is a good choice as it means having to build two
subplans. Also determining the cheapest plan to use couldn't use the
existing logic that's in fix_alternative_subplan(). It might be best
left until we do some refactoring so instead of building subplans as
soon as we've run the planner, instead have it keep a list of Paths
around and then choose the best Path once the top-level plan has been
planned. That's a pretty big change.

On making another pass over this patchset, I feel there are two points
that might still raise a few eyebrows:

1. In order to not have Nested Loops picked with an inner Result Cache
when the inner index's parameters have no valid statistics, I modified
estimate_num_groups() to add a new parameter that allows callers to
pass an EstimationInfo struct to have the function set a flag to
indicate of DEFAULT_NUM_DISTINCT was used. Callers which don't care
about this can just pass NULL. I did once try adding a new parameter
to clauselist_selectivity() in 2686ee1b. There was not much
excitement about that we ended up removing it again. I don't see any
alternative here.

2. Nobody really mentioned they didn't like the name Result Cache. I
really used that as a placeholder name until I came up with something
better. I mentioned a few other names in [1]/messages/by-id/CAApHDvoj_sH1H3JVXgHuwnxf1FQbjRVOqqgxzOgJX13NiA9-cg@mail.gmail.com. If nobody is objecting
to Result Cache, I'll just keep it named that way.

David

Attachments:

v10-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v10-0003-Add-Result-Cache-executor-node.patchDownload

From 5ca69b57c7ebc4480ffd30883b56c7c91344dcce Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v10 3/4] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  147 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1147 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  227 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   70 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   68 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2769 insertions(+), 174 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..aaa7544177 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1573,6 +1573,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1599,6 +1600,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..2e533999d1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -479,10 +479,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d6901c..f91d7bfc55 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4849,6 +4849,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fadadef050 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1264,6 +1266,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1955,6 +1960,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3028,6 +3037,144 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we free'd memory, so we must use mem_used
+	 * when mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 0c10f1d35c..f5786e9205 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 79b325c7cf..86ff12537c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3466,3 +3466,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..b1b313aae6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 01b7b926bf..fbbe667cc1 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4201c7eb10
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1147 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- materialize the result of a subplan
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/*
+ * States of the ExecResultCache state machine
+ */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /*
+  * ResultCacheTuple Stores an individually cached tuple
+  */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used, instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_lowerlimit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_upperlimit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_lowerlimit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_upperlimit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+					else
+					{
+						/* The cache entry is void of any tuples. */
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					/*
+					 * Validate if the planner properly set the singlerow
+					 * flag.  It should only set that if each cache entry can,
+					 * at most, return 1 row.  XXX is this worth the check?
+					 */
+					if (unlikely(entry->complete))
+						elog(ERROR, "cache entry already complete");
+
+					/* Record the tuple in the current cache entry */
+					if (unlikely(!cache_store_tuple(node, outerslot)))
+					{
+						/* Couldn't store it?  Handle overflow */
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out entry or last_tuple as we'll
+						 * stay in bypass mode until the end of the scan.
+						 */
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_upperlimit = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the lower limit to something a bit less than the upper limit so
+	 * that we don't have to evict tuples every time we need to add a new one
+	 * after the cache has filled.  We don't make it too much smaller as we'd
+	 * like to keep as much in the cache as possible.
+	 */
+	rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 910906f639..10b55f33ad 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4930,6 +4957,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9c73c605a4..ad265c8e90 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3823,6 +3853,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4057,6 +4090,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 169d5581b9..75e766387a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..9f538814c5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4166,6 +4166,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d2bf9912e9..f550d36407 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 4a35903b29..53d8df3632 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,193 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos((Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1669,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1678,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1848,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1873,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 40abe6f9f6..820f679f69 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1514,6 +1527,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6343,6 +6406,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6929,6 +7014,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6974,6 +7060,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 127ea3d856..9ba06671c0 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index fcce81926b..7a38a1a4ae 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e1aaeecc8a..e8dbc90fd6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3852,6 +3901,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4070,6 +4130,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 02d2d267b5..f07d2766b9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9c9091e601..599ab6d850 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 0c48d2a519..8e76f63635 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..b465e706fa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1968,6 +1969,73 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_upperlimit; /* memory limit in bytes for the cache */
+	uint64		mem_lowerlimit; /* reduce memory usage to below this when we
+								 * free up space */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 3684f87a88..39a9502e87 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b4059895de..e66f6e74be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1462,6 +1462,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7e6b10f86b..64f752d9fc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 8e621d2f76..b5a20fa01e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3bd7072ae8..fa13a6df37 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a118041731..dbc872b489 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3575,8 +3577,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3586,17 +3588,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3606,9 +3610,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4122,8 +4128,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4133,11 +4139,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4158,13 +4167,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4174,15 +4183,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4228,14 +4243,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4873,34 +4891,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4955,14 +4979,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c72a6d051f..141a6c89e2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2061,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2071,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2144,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2180,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2217,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..205cbb82ab
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 9d56cdacf3..0b023a0bbb 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ae89ed7f0b..8fee8ad621 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 525bdc804f..4be9f4e99e 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -199,6 +199,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4de24c1904..909306a40a 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ffd5fe8b0d..a55711cc7f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..2a84cf3845
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v10-0004-Use-a-Result-Cache-node-to-cache-results-from-su.patchtext/plain; charset=US-ASCII; name=v10-0004-Use-a-Result-Cache-node-to-cache-results-from-su.patchDownload

From 8be3ab1f9db137b848d9ef1ca02bf43addaba5bf Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Fri, 4 Dec 2020 00:39:48 +1300
Subject: [PATCH v10 4/4] Use a Result Cache node to cache results from
 subplans

---
 .../postgres_fdw/expected/postgres_fdw.out    |  49 +++++----
 src/backend/optimizer/plan/subselect.c        | 103 ++++++++++++++++++
 src/test/regress/expected/aggregates.out      |   6 +-
 src/test/regress/expected/groupingsets.out    |  20 ++--
 src/test/regress/expected/join.out            |  16 +--
 src/test/regress/expected/join_hash.out       |  58 +++++++---
 src/test/regress/expected/resultcache.out     |  37 +++++++
 src/test/regress/expected/rowsecurity.out     |  20 ++--
 src/test/regress/expected/select_parallel.out |  28 +++--
 src/test/regress/expected/subselect.out       |  20 ++--
 src/test/regress/sql/resultcache.sql          |   9 ++
 11 files changed, 287 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index aaa7544177..fc9e18d636 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2112,22 +2112,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2908,10 +2911,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2922,8 +2928,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2933,11 +2939,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 7a38a1a4ae..9b93cb27ac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -137,6 +138,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -234,6 +303,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *rcpath;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			rcpath = create_resultcache_path(root,
+											 best_path->parent,
+											 best_path,
+											 param_exprs,
+											 operators,
+											 false,
+											 -1);
+			best_path = (Path *) rcpath;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1eb0f7346b..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 701d52b465..2256f6da67 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -774,19 +774,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dbc872b489..4e3d893ec3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2976,8 +2976,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2987,11 +2987,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..9f04684fcd 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_2.c * 5)
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_1.b * 5)
-(28 rows)
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_1.b * 5)
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
          SubPlan 2
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_2.c * 5)
-(28 rows)
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
index 205cbb82ab..7870102f0a 100644
--- a/src/test/regress/expected/resultcache.out
+++ b/src/test/regress/expected/resultcache.out
@@ -151,3 +151,40 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+                                explain_resultcache                                
+-----------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..a3caf95c8d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1168,9 +1172,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 0b023a0bbb..2faf4a6c40 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -968,19 +968,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
index 2a84cf3845..bbd1bcd013 100644
--- a/src/test/regress/sql/resultcache.sql
+++ b/src/test/regress/sql/resultcache.sql
@@ -76,3 +76,12 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
-- 
2.27.0

v10-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v10-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From d5dd10b9222b330f2500828b4accb273aaa21f15 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v10 2/4] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v10-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v10-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 244f8f87212fdd3fc0819f1209a2d2bfe203cd12 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v10 1/4] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..796ece6b3b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2953,7 +2953,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..d2bf9912e9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..ea7b0dd601 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4793,7 +4795,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 745f443e5c..f33033bc27 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 51478957fb..e1aaeecc8a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 80bd60f876..910515ffb2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 3a2cfb7efa..a50e9ad5f4 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	int			flags;			/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +208,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

#71

[1]: /messages/by-id/CAApHDvqvGZUPKHO+4Xp7Lm_q1OXBo2Yp1=5pVnEUcr4dgOXxEg@mail.gmail.com
/messages/by-id/CAApHDvqvGZUPKHO+4Xp7Lm_q1OXBo2Yp1=5pVnEUcr4dgOXxEg@mail.gmail.com

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#70)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for working on the new version.

On Fri, Dec 4, 2020 at 10:41 PM David Rowley <dgrowleyml@gmail.com> wrote:

I also
noticed that the code I'd written to build the cache lookup expression
included a step to deform the outer tuple. This was unnecessary and
slowed down the expression evaluation.

I thought it would be something like my 3rd suggestion on [1]/messages/by-id/CAApHDvqvGZUPKHO+4Xp7Lm_q1OXBo2Yp1=5pVnEUcr4dgOXxEg@mail.gmail.com, however after
I read the code, it looked like no. Could you explain what changes it is?
I probably missed something.

--
Best Regards
Andy Fan

#72

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#66)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for this review. I somehow missed addressing what's mentioned
here for the v10 patch. Comments below.

On Mon, 23 Nov 2020 at 02:21, Andy Fan <zhihui.fan1213@gmail.com> wrote:

1. modified   src/include/utils/selfuncs.h
@@ -70,9 +70,9 @@
* callers to provide further details about some assumptions which were made
* during the estimation.
*/
-#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one of
-  * the DEFAULTs as defined above.
-  */
+#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one
+ * of the DEFAULTs as defined
+ * above. */

Looks nothing has changed.

I accidentally took the changes made by pgindent into the wrong patch.
Fixed that in v10.

2. leading spaces is not necessary in comments.

/*
* ResultCacheTuple Stores an individually cached tuple
*/
typedef struct ResultCacheTuple
{
MinimalTuple mintuple; /* Cached tuple */
struct ResultCacheTuple *next; /* The next tuple with the same parameter
* values or NULL if it's the last one */
} ResultCacheTuple;

OK, I've changed that so that they're on 1 line instead of 3.

3. We define ResultCacheKey as below.

/*
* ResultCacheKey
* The hash table key for cached entries plus the LRU list link
*/
typedef struct ResultCacheKey
{
MinimalTuple params;
dlist_node lru_node; /* Pointer to next/prev key in LRU list */
} ResultCacheKey;

Since we store it as a MinimalTuple, we need some FETCH_INNER_VAR step for
each element during the ResultCacheHash_equal call. I am thinking if we can
store a "Datum *" directly to save these steps. exec_aggvalues/exec_aggnulls looks
a good candidate for me, except that the name looks not good. IMO, we can
rename exec_aggvalues/exec_aggnulls and try to merge
EEOP_AGGREF/EEOP_WINDOW_FUNC into a more generic step which can be
reused in this case.

I think this is along the lines of what I'd been thinking about and
mentioned internally to Thomas and Andres. I called it a MemTuple and
it was basically a contiguous block of memory with Datum and isnull
arrays and any varlena attributes at the end of the contiguous
allocation. These could quickly be copied into a VirtualSlot with
zero deforming. I've not given this too much thought yet, but if I
was to do this I'd be aiming to store the cached tuple this way to so
save having to deform it each time we get a cache hit. We'd use more
memory storing entries this way, but if we're not expecting the Result
Cache to fill work_mem, then perhaps it's another approach that the
planner could decide on. Perhaps the cached tuple pointer could be a
union to allow us to store either without making the struct any
larger.

However, FWIW, I'd prefer to think about this later though.

4. I think the ExecClearTuple in prepare_probe_slot is not a must, since the
data tts_values/tts_flags/tts_nvalid are all reset later, and tts_tid is not
real used in our case. Since both prepare_probe_slot
and ResultCacheHash_equal are in pretty hot path, we may need to consider it.

I agree that it would be nice not to do the ExecClearTuple(), but the
only way I can see to get rid of it also requires getting rid of the
ExecStoreVirtualTuple(). The problem is ExecStoreVirtualTuple()
Asserts that the slot is empty, which it won't be the second time
around unless we ExecClearTuple it. It seems that to make that work
we'd have to manually set slot->tts_nvalid. I see other places in the
code doing this ExecClearTuple() / ExecStoreVirtualTuple() dance, so I
don't think it's going to be up to this patch to start making
optimisations just for this 1 case.

David

#73

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#71)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Sun, 6 Dec 2020 at 03:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Dec 4, 2020 at 10:41 PM David Rowley <dgrowleyml@gmail.com> wrote:

I also
noticed that the code I'd written to build the cache lookup expression
included a step to deform the outer tuple. This was unnecessary and
slowed down the expression evaluation.

I thought it would be something like my 3rd suggestion on [1], however after
I read the code, it looked like no. Could you explain what changes it is?
I probably missed something.

Basically, an extra argument in ExecBuildParamSetEqual() which allows
the TupleTableSlotOps for the left and right side to be set
individually. Previously I was passing a single TupleTableSlotOps of
TTSOpsMinimalTuple. The probeslot is a TTSOpsVirtual tuple, so
passing TTSOpsMinimalTuple causes the function to add a needless
EEOP_OUTER_FETCHSOME step to the expression.

David

#74

[1]: /messages/by-id/CALNJ-vRAgksPqjK-sAU+9gu3R44s_3jVPJ_5SDB++jjEkTntiA@mail.gmail.com
[2]: /messages/by-id/CAApHDvpGX7RN+sh7Hn9HWZQKp53SjKaL=GtDzYheHWiEd-8moQ@mail.gmail.com

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#73)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

I've attached another patchset that addresses some comments left by
Zhihong Yu over on [1]/messages/by-id/CALNJ-vRAgksPqjK-sAU+9gu3R44s_3jVPJ_5SDB++jjEkTntiA@mail.gmail.com. The version number got bumped to v12 instead
of v11 as I still have a copy of the other version of the patch which
I made some changes to and internally named v11.

The patchset has grown 1 additional patch which is the 0004 patch.
The review on the other thread mentioned that I should remove the code
duplication for the full cache check that I had mostly duplicated
between adding a new entry to the cache and adding tuple to an
existing entry. I'm still a bit unsure that I like merging this into
a helper function. One call needs the return value of the function to
be a boolean value to know if it's still okay to use the cache. The
other need the return value to be the cache entry. The patch makes the
helper function return the entry and returns NULL to communicate the
false value. I'm not a fan of the change and might drop it.

The 0005 patch is now the only one that I think needs more work to
make it good enough. This is Result Cache for subplans. I mentioned
in [2]/messages/by-id/CAApHDvpGX7RN+sh7Hn9HWZQKp53SjKaL=GtDzYheHWiEd-8moQ@mail.gmail.com what my problem with that patch is.

On Mon, 7 Dec 2020 at 12:50, David Rowley <dgrowleyml@gmail.com> wrote:

Basically, an extra argument in ExecBuildParamSetEqual() which allows
the TupleTableSlotOps for the left and right side to be set
individually. Previously I was passing a single TupleTableSlotOps of
TTSOpsMinimalTuple. The probeslot is a TTSOpsVirtual tuple, so
passing TTSOpsMinimalTuple causes the function to add a needless
EEOP_OUTER_FETCHSOME step to the expression.

I also benchmarked that change and did see that it gives a small but
notable improvement to the performance.

David

Attachments:

v12-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v12-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From cfbfb8187f4e8303fe3358b5c909533ee6629efe Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v12 1/5] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..796ece6b3b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2953,7 +2953,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..d2bf9912e9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index bcb1bc6097..4f6ab5d635 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1986,6 +1986,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..ea7b0dd601 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4793,7 +4795,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 745f443e5c..f33033bc27 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 51478957fb..e1aaeecc8a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 80bd60f876..910515ffb2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 3a2cfb7efa..a50e9ad5f4 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -65,6 +65,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	int			flags;			/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -194,7 +208,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v12-0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v12-0004-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From d9c3f2cab13ec26bbd8d1245be6304c506e1f878 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v12 4/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 5b58c2f059..b1b4f22a03 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -431,6 +431,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocate more than our memory budget and, if so, reduce
+ *		the memory used by the cache.  Returns the cache entry belonging to
+ *		'entry', which may have changed address by shuffling the deleted
+ *		entries back to their optimal position.  Returns NULL if the attempt
+ *		to free enough memory resulted in 'entry' itself being evicted from
+ *		the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -493,44 +541,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -576,41 +587,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

v12-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v12-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From e38153eda9fbe5c7bd5cb9e4f4a2b579e0658927 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v12 2/5] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v12-0005-Use-a-Result-Cache-node-to-cache-results-from-su.patchtext/plain; charset=US-ASCII; name=v12-0005-Use-a-Result-Cache-node-to-cache-results-from-su.patchDownload

From 51824631a6332265390c87a169f726e687e7df3a Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Fri, 4 Dec 2020 00:39:48 +1300
Subject: [PATCH v12 5/5] Use a Result Cache node to cache results from
 subplans

---
 .../postgres_fdw/expected/postgres_fdw.out    |  49 +++++----
 src/backend/optimizer/plan/subselect.c        | 103 ++++++++++++++++++
 src/test/regress/expected/aggregates.out      |   6 +-
 src/test/regress/expected/groupingsets.out    |  20 ++--
 src/test/regress/expected/join.out            |  16 +--
 src/test/regress/expected/join_hash.out       |  58 +++++++---
 src/test/regress/expected/resultcache.out     |  37 +++++++
 src/test/regress/expected/rowsecurity.out     |  20 ++--
 src/test/regress/expected/select_parallel.out |  28 +++--
 src/test/regress/expected/subselect.out       |  20 ++--
 src/test/regress/sql/resultcache.sql          |   9 ++
 11 files changed, 287 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index aaa7544177..fc9e18d636 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2112,22 +2112,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2908,10 +2911,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2922,8 +2928,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2933,11 +2939,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 7a38a1a4ae..9b93cb27ac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -137,6 +138,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -234,6 +303,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *rcpath;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			rcpath = create_resultcache_path(root,
+											 best_path->parent,
+											 best_path,
+											 param_exprs,
+											 operators,
+											 false,
+											 -1);
+			best_path = (Path *) rcpath;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1eb0f7346b..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 701d52b465..2256f6da67 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -774,19 +774,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index ff96002c07..26302f3abd 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2976,8 +2976,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2987,11 +2987,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..9f04684fcd 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_2.c * 5)
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_1.b * 5)
-(28 rows)
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_1.b * 5)
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
          SubPlan 2
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_2.c * 5)
-(28 rows)
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
index 205cbb82ab..7870102f0a 100644
--- a/src/test/regress/expected/resultcache.out
+++ b/src/test/regress/expected/resultcache.out
@@ -151,3 +151,40 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+                                explain_resultcache                                
+-----------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..a3caf95c8d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1168,9 +1172,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 0b023a0bbb..2faf4a6c40 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -968,19 +968,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
index 2a84cf3845..bbd1bcd013 100644
--- a/src/test/regress/sql/resultcache.sql
+++ b/src/test/regress/sql/resultcache.sql
@@ -76,3 +76,12 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
-- 
2.27.0

v12-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v12-0003-Add-Result-Cache-executor-node.patchDownload

From bd96ba78a71a8b2348c0ea110594bca049ea904a Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v12 3/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  147 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1134 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  227 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   70 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2754 insertions(+), 174 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..aaa7544177 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1573,6 +1573,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1599,6 +1600,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..2e533999d1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -479,10 +479,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d6901c..f91d7bfc55 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4849,6 +4849,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fadadef050 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1264,6 +1266,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1955,6 +1960,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3028,6 +3037,144 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we free'd memory, so we must use mem_used
+	 * when mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 0c10f1d35c..f5786e9205 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 79b325c7cf..86ff12537c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3466,3 +3466,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..b1b313aae6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 01b7b926bf..fbbe667cc1 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..5b58c2f059
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1134 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- materialize the result of a subplan
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 910906f639..10b55f33ad 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4930,6 +4957,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9c73c605a4..ad265c8e90 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1908,6 +1923,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3823,6 +3853,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4057,6 +4090,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 169d5581b9..75e766387a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2150,6 +2150,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2832,6 +2852,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..9f538814c5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4166,6 +4166,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d2bf9912e9..f550d36407 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 4a35903b29..53d8df3632 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,193 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos((Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1669,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1678,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1848,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1873,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 40abe6f9f6..820f679f69 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1514,6 +1527,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6343,6 +6406,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6929,6 +7014,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6974,6 +7060,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 127ea3d856..9ba06671c0 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index fcce81926b..7a38a1a4ae 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e1aaeecc8a..e8dbc90fd6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3852,6 +3901,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4070,6 +4130,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 635d91d50a..92bf3da6d9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1016,6 +1016,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9c9091e601..599ab6d850 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 0c48d2a519..8e76f63635 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 98db885f6f..fcafc03725 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..2325dbf2b1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1968,6 +1969,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 3684f87a88..39a9502e87 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b4059895de..e66f6e74be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1462,6 +1462,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7e6b10f86b..64f752d9fc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 8e621d2f76..b5a20fa01e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3bd7072ae8..fa13a6df37 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b0533a7195..ff96002c07 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4909,34 +4927,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4991,14 +5015,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c72a6d051f..141a6c89e2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2061,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2071,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2144,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2180,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2217,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..205cbb82ab
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 9d56cdacf3..0b023a0bbb 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ae89ed7f0b..8fee8ad621 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 525bdc804f..4be9f4e99e 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -199,6 +199,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 00720b629a..bff0d67e79 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ffd5fe8b0d..a55711cc7f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..2a84cf3845
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

#75

https://github.com/gregrahn/join-order-benchmark

dgrowleyml@gmail.com

about 5 years ago

In reply to: David Rowley (#74)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 8 Dec 2020 at 20:15, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached another patchset that addresses some comments left by
Zhihong Yu over on [1]. The version number got bumped to v12 instead
of v11 as I still have a copy of the other version of the patch which
I made some changes to and internally named v11.

If anyone else wants to have a look at these, please do so soon. I'm
planning on starting to take a serious look at getting 0001-0003 in
early next week.

David

#76

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 5 years ago

In reply to: David Rowley (#75)

2 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On 09.12.2020 23:53, David Rowley wrote:

On Tue, 8 Dec 2020 at 20:15, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached another patchset that addresses some comments left by
Zhihong Yu over on [1]. The version number got bumped to v12 instead
of v11 as I still have a copy of the other version of the patch which
I made some changes to and internally named v11.

If anyone else wants to have a look at these, please do so soon. I'm
planning on starting to take a serious look at getting 0001-0003 in
early next week.

David

I tested the patched version of Postgres on JOBS benchmark:

For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:

explain analyze SELECT MIN(chn.name) AS character,
       MIN(t.title) AS movie_with_american_producer
FROM char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     company_type AS ct,
     movie_companies AS mc,
     role_type AS rt,
     title AS t
WHERE ci.note LIKE '%(producer)%'
AND cn.country_code = '[us]'
AND t.production_year > 1990
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND ci.movie_id = mc.movie_id
AND chn.id = ci.person_role_id
AND rt.id = ci.role_id
AND cn.id = mc.company_id
AND ct.id = mc.company_type_id;
explain analyze SELECT MIN(cn.name) AS from_company,
       MIN(lt.link) AS movie_link_type,
       MIN(t.title) AS non_polish_sequel_movie
FROM company_name AS cn,
     company_type AS ct,
     keyword AS k,
     link_type AS lt,
     movie_companies AS mc,
     movie_keyword AS mk,
     movie_link AS ml,
     title AS t
WHERE cn.country_code !='[pl]'
AND (cn.name LIKE '%Film%'
       OR cn.name LIKE '%Warner%')
AND ct.kind ='production companies'
AND k.keyword ='sequel'
AND lt.link LIKE '%follow%'
AND mc.note IS NULL
AND t.production_year BETWEEN 1950 AND 2000
AND lt.id = ml.link_type_id
AND ml.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_type_id = ct.id
AND mc.company_id = cn.id
AND ml.movie_id = mk.movie_id
AND ml.movie_id = mc.movie_id
AND mk.movie_id = mc.movie_id;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
Finalize Aggregate (cost=300131.43..300131.44 rows=1 width=64)
(actual time=522985.919..522993.614 rows=1 loops=1)
   -> Gather (cost=300131.00..300131.41 rows=4 width=64) (actual
time=522985.908..522993.606 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         -> Partial Aggregate (cost=299131.00..299131.01 rows=1
width=64) (actual time=522726.599..522726.606 rows=1 loops=5)
               -> Hash Join (cost=38559.78..298508.36 rows=124527
width=33) (actual time=301521.477..522726.592 rows=2 loops=5)
                     Hash Cond: (ci.role_id = rt.id)
                     -> Hash Join (cost=38558.51..298064.76
rows=124527 width=37) (actual time=301521.418..522726.529 rows=2 loops=5)
                           Hash Cond: (mc.company_type_id = ct.id)
                           -> Nested Loop (cost=38557.42..297390.45
rows=124527 width=41) (actual time=301521.392..522726.498 rows=2 loops=5)
                                 -> Nested Loop
(cost=38556.98..287632.46 rows=255650 width=29) (actual
time=235.183..4596.950 rows=156421 loops=5)
                                       Join Filter: (t.id = ci.movie_id)
                                       -> Parallel Hash Join
(cost=38556.53..84611.99 rows=162109 width=29) (actual
time=234.991..718.934 rows=119250 loops
=5)
                                             Hash Cond: (t.id =
mc.movie_id)
                                             -> Parallel Seq Scan on
title t (cost=0.00..43899.19 rows=435558 width=21) (actual
time=0.010..178.332 rows=34
9806 loops=5)
                                                   Filter:
(production_year > 1990)
                                                   Rows Removed by
Filter: 155856
                                             -> Parallel Hash
(cost=34762.05..34762.05 rows=303558 width=8) (actual
time=234.282..234.285 rows=230760 loops
=5)
                                                   Buckets: 2097152
(originally 1048576) Batches: 1 (originally 1) Memory Usage: 69792kB
                                                   -> Parallel Hash
Join (cost=5346.12..34762.05 rows=303558 width=8) (actual
time=11.846..160.085 rows=230
760 loops=5)
                                                         Hash Cond:
(mc.company_id = cn.id)
                                                         -> Parallel
Seq Scan on movie_companies mc (cost=0.00..27206.55 rows=841655
width=12) (actual time
=0.013..40.426 rows=521826 loops=5)
                                                         -> Parallel
Hash (cost=4722.92..4722.92 rows=49856 width=4) (actual
time=11.658..11.659 rows=16969
loops=5)
Buckets: 131072 Batches: 1 Memory Usage: 4448kB
-> Parallel Seq Scan on company_name cn (cost=0.00..4722.92 rows=49856
width=4) (actual time
=0.014..8.324 rows=16969 loops=5)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 30031
                                       -> Result Cache
(cost=0.45..1.65 rows=2 width=12) (actual time=0.019..0.030 rows=1
loops=596250)
                                             Cache Key: mc.movie_id
                                             Hits: 55970 Misses:
62602 Evictions: 0 Overflows: 0 Memory Usage: 6824kB
                                             Worker 0: Hits: 56042
Misses: 63657 Evictions: 0 Overflows: 0 Memory Usage: 6924kB
                                             Worker 1: Hits: 56067
Misses: 63659 Evictions: 0 Overflows: 0 Memory Usage: 6906kB
                                             Worker 2: Hits: 55947
Misses: 62171 Evictions: 0 Overflows: 0 Memory Usage: 6767kB
                                             Worker 3: Hits: 56150
Misses: 63985 Evictions: 0 Overflows: 0 Memory Usage: 6945kB
                                             -> Index Scan using
cast_info_movie_id_idx on cast_info ci (cost=0.44..1.64 rows=2
width=12) (actual time=0.03
3..0.053 rows=1 loops=316074)
                                                   Index Cond:
(movie_id = mc.movie_id)
                                                   Filter:
((note)::text ~~ '%(producer)%'::text)
                                                   Rows Removed by
Filter: 25
                                 -> Result Cache (cost=0.44..0.59
rows=1 width=20) (actual time=3.311..3.311 rows=0 loops=782104)
                                       Cache Key: ci.person_role_id
                                       Hits: 5 Misses: 156294
Evictions: 0 Overflows: 0 Memory Usage: 9769kB
                                       Worker 0: Hits: 0 Misses:
156768 Evictions: 0 Overflows: 0 Memory Usage: 9799kB
                                       Worker 1: Hits: 1 Misses:
156444 Evictions: 0 Overflows: 0 Memory Usage: 9778kB
                                       Worker 2: Hits: 0 Misses:
156222 Evictions: 0 Overflows: 0 Memory Usage: 9764kB
                                       Worker 3: Hits: 0 Misses:
156370 Evictions: 0 Overflows: 0 Memory Usage: 9774kB
                                       -> Index Scan using
char_name_pkey on char_name chn (cost=0.43..0.58 rows=1 width=20)
(actual time=0.001..0.001 rows
=0 loops=782098)
                                             Index Cond: (id =
ci.person_role_id)
                           -> Hash (cost=1.04..1.04 rows=4 width=4)
(actual time=0.014..0.014 rows=4 loops=5)
                                 Buckets: 1024 Batches: 1 Memory
Usage: 9kB
                                 -> Seq Scan on company_type ct
(cost=0.00..1.04 rows=4 width=4) (actual time=0.012..0.012 rows=4 loops=5)
                     -> Hash (cost=1.12..1.12 rows=12 width=4)
(actual time=0.027..0.028 rows=12 loops=5)
                           Buckets: 1024 Batches: 1 Memory Usage: 9kB
                           -> Seq Scan on role_type rt
(cost=0.00..1.12 rows=12 width=4) (actual time=0.022..0.023 rows=12 loops=5)
Planning Time: 2.398 ms
Execution Time: 523002.608 ms
(55 rows)

I attach file with times of query execution.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#77

[1]: /messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com

dgrowleyml@gmail.com

about 5 years ago

In reply to: Konstantin Knizhnik (#76)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks a lot for testing this patch. It's good to see it run through a
benchmark that exercises quite a few join problems.

On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:

explain analyze SELECT MIN(chn.name) AS character,

...

Execution Time: 523002.608 ms

I attach file with times of query execution.

I noticed the time reported in results.csv is exactly the same as the
one in the EXPLAIN ANALYZE above. One thing to note there that it
would be a bit fairer if the benchmark was testing the execution time
of the query instead of the time to EXPLAIN ANALYZE.

One of the reasons that the patch may look less favourable here is
that the timing overhead on EXPLAIN ANALYZE increases with additional
nodes.

If I just put this to the test by using the tables and query from [1]/messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com.

# explain (analyze, costs off) select count(*) from hundredk hk inner
# join lookup l on hk.thousand = l.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (actual time=1891.262..1891.263 rows=1 loops=1)
-> Nested Loop (actual time=0.312..1318.087 rows=9990000 loops=1)
-> Seq Scan on hundredk hk (actual time=0.299..15.753
rows=100000 loops=1)
-> Result Cache (actual time=0.000..0.004 rows=100 loops=100000)
Cache Key: hk.thousand
Hits: 99000 Misses: 1000 Evictions: 0 Overflows: 0
Memory Usage: 3579kB
-> Index Only Scan using lookup_a_idx on lookup l
(actual time=0.003..0.012 rows=100 loops=1000)
Index Cond: (a = hk.thousand)
Heap Fetches: 0
Planning Time: 3.471 ms
Execution Time: 1891.612 ms
(11 rows)

You can see here the query took 1.891 seconds to execute.

Same query without EXPLAIN ANALYZE.

postgres=# \timing
Timing is on.
postgres=# select count(*) from hundredk hk inner
postgres-# join lookup l on hk.thousand = l.a;
count
---------
9990000
(1 row)

Time: 539.449 ms

Or is it more accurate to say it took just 0.539 seconds?

Going through the same query after disabling; enable_resultcache,
enable_mergejoin, enable_nestloop, I can generate the following table
which compares the EXPLAIN ANALYZE time to the \timing on time.

postgres=# select type,ea_time,timing_time, round(ea_time::numeric /
timing_time::numeric,3) as ea_overhead from results order by
timing_time;
type | ea_time | timing_time | ea_overhead
----------------+----------+-------------+-------------
Nest loop + RC | 1891.612 | 539.449 | 3.507
Merge join | 2411.632 | 1008.991 | 2.390
Nest loop | 2484.82 | 1049.63 | 2.367
Hash join | 4969.284 | 3272.424 | 1.519

Result Cache will be hit a bit harder by this problem due to it having
additional nodes in the plan. The Hash Join query seems to suffer much
less from this problem.

However, saying that. It's certainly not the entire problem here:

Hits: 5 Misses: 156294 Evictions: 0 Overflows: 0 Memory Usage: 9769kB

The planner must have thought there'd be more hits than that or it
wouldn't have thought Result Caching would be a good plan. Estimating
the cache hit ratio using n_distinct becomes much less reliable when
there are joins and filters. A.K.A the real world.

David

#78

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: David Rowley (#74)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

@cfbot: rebased on 55dc86eca70b1dc18a79c141b3567efed910329d

On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote:

From cfbfb8187f4e8303fe3358b5c909533ee6629efe Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v12 1/5] Allow estimate_num_groups() to pass back further
details about the estimation

+#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one

...

+typedef struct EstimationInfo
+{
+	int			flags;			/* Flags, as defined above to mark special
+								 * properties of the estimation. */

Maybe it should be a bits32 ?
(Also, according to Michael, some people preferred 0x01 to 1<<0)

+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);

I think these assertions aren't useful since the type is unsigned:
+ uint64 mem_used; /* bytes of memory used by cache */

+ hash_mem_bytes = get_hash_mem() * 1024L;

I think "result cache nodes" should be added here:

doc/src/sgml/config.sgml- <para>
doc/src/sgml/config.sgml- Hash-based operations are generally more sensitive to memory
doc/src/sgml/config.sgml- availability than equivalent sort-based operations. The
doc/src/sgml/config.sgml- memory available for hash tables is computed by multiplying
doc/src/sgml/config.sgml- <varname>work_mem</varname> by
doc/src/sgml/config.sgml: <varname>hash_mem_multiplier</varname>. This makes it
doc/src/sgml/config.sgml- possible for hash-based operations to use an amount of memory
doc/src/sgml/config.sgml- that exceeds the usual <varname>work_mem</varname> base
doc/src/sgml/config.sgml- amount.
doc/src/sgml/config.sgml- </para>

Language fixen follow:

+ * Initialize the hash table to empty.

as empty

+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating

sp: evaluating

From d9c3f2cab13ec26bbd8d1245be6304c506e1f878 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v12 4/5] Remove code duplication in nodeResultCache.c

+ * cache_check_mem
+ *		Check if we've allocate more than our memory budget and, if so, reduce

allocated

XXX: what patch???

+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory

spell: executor
once COMMA

+ * inappropriate to do so. If we see that this has been done then we'll

done COMMA

+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.

we'll

+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the

zero COMMA

+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.

"whether it's a cache hit or not"

+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.

same

+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.

dotmake

+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses

evictions

--
Justin

Attachments:

0001-Allow-estimate_num_groups-to-pass-back-further-detai.patchtext/x-diff; charset=us-asciiDownload

From d6f6025040ef3ab8d28fbe8b5286df3d16f4397b Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH 1/5] Allow estimate_num_groups() to pass back further details
 about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ce42ce3f1..43eca1f509 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3067,7 +3067,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..aaff28ac52 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..baa6c5245a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4793,7 +4795,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 86f794c193..f35b162308 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d465b9e213..7e45e0ffdf 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1688,6 +1688,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 47ca4ddbb5..d37faee446 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..ca05a64c42 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	int			flags;			/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.17.0

0002-Allow-users-of-simplehash.h-to-perform-direct-deleti.patchtext/x-diff; charset=us-asciiDownload

From 3e8a8dafa7988ff179b3c2f25a83095f372aa792 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH 2/5] Allow users of simplehash.h to perform direct deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.17.0

0003-Add-Result-Cache-executor-node.patchtext/x-diff; charset=us-asciiDownload

From 62771591a4797449e76b41dcc66d05c8ba3534ff Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH 3/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   18 +
 src/backend/commands/explain.c                |  147 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1134 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  227 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   70 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2754 insertions(+), 174 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 07e06e5bf7..ee2582cf65 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1595,6 +1595,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1621,6 +1622,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 647192cf6a..d337c9c906 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -501,10 +501,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1037df5a9..52eef11c20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4855,6 +4855,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..7f0df0239b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1280,6 +1282,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1971,6 +1976,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3044,6 +3053,144 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we free'd memory, so we must use mem_used
+	 * when mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do anything.  We needn't consider
+			 * cache hits as we'll always get a miss before a hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..41506c4e13 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 8fc2a2666b..921211fcb6 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3494,3 +3494,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		scratch.opcode = finfo->fn_strict ? EEOP_FUNCEXPR_STRICT :
+			EEOP_FUNCEXPR;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..3e0508a1f4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..5b58c2f059
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1134 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- materialize the result of a subplan
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evalulating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Ensure we didn't mess up the tracking somehow */
+	Assert(rcstate->mem_used >= 0);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba3ccc712c..e2556214cb 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4945,6 +4972,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8392be6d44..a488dfa22e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1909,6 +1924,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3833,6 +3863,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4067,6 +4100,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d2c8d58070..d660eba5b2 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2151,6 +2151,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2833,6 +2853,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..26b63a99b4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4166,6 +4166,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aaff28ac52..79fc171df3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once we make a call to the excutor here to ask it what memory
+	 * overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, well take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of if it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of if it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..94bb5cb849 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,193 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info, List **param_exprs,
+							List **operators, RelOptInfo *outerrel,
+							RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+			var_relids = pull_varnos(root, (Node *) ((PlaceHolderVar *) expr)->phexpr);
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible,.make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root, inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1669,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1678,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1848,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1873,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 25d4750ca6..2fe57ce885 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1514,6 +1527,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6348,6 +6411,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6934,6 +7019,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6979,6 +7065,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..9584cdb653 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..92ad54e41e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7e45e0ffdf..033f2c4894 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1551,6 +1551,55 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  The planner may choose to set this to
+	 * some better value, but if left at 0 then the executor will just use a
+	 * predefined hash table size for the cache.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3852,6 +3901,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->partitioned_rels,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4070,6 +4130,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..07e5698a82 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index bd57e917e1..93ffb68c7a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 758c3ca097..344ec8b84e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..d2f3ed9a74
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c94a..cb1a4fd845 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1974,6 +1975,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index caed683ba9..282115ecaa 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index cde2637798..8a48dfa368 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1462,6 +1462,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..5f0c408007 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1dd12d484e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 23dec14cbd..77d6339fbe 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -79,6 +79,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c72a6d051f..141a6c89e2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1930,6 +1930,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2058,8 +2061,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2068,32 +2071,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2102,31 +2108,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2135,30 +2144,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2168,31 +2180,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2202,26 +2217,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..205cbb82ab
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0e1ef71dd..fd0de3199a 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -114,7 +114,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 081fce32e7..285de3e2c0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -200,6 +200,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ffd5fe8b0d..a55711cc7f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -453,6 +453,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..2a84cf3845
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evitions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.17.0

0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/x-diff; charset=us-asciiDownload

From 00150fbc45581006c5a37599eaf6a3e1ef900e56 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH 4/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 5b58c2f059..b1b4f22a03 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -431,6 +431,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocate more than our memory budget and, if so, reduce
+ *		the memory used by the cache.  Returns the cache entry belonging to
+ *		'entry', which may have changed address by shuffling the deleted
+ *		entries back to their optimal position.  Returns NULL if the attempt
+ *		to free enough memory resulted in 'entry' itself being evicted from
+ *		the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -493,44 +541,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -576,41 +587,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.17.0

0005-Use-a-Result-Cache-node-to-cache-results-from-subpla.patchtext/x-diff; charset=us-asciiDownload

From 41429921736879dde027225d3d2814424f51ec14 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Fri, 4 Dec 2020 00:39:48 +1300
Subject: [PATCH 5/5] Use a Result Cache node to cache results from subplans

---
 .../postgres_fdw/expected/postgres_fdw.out    |  49 +++++----
 src/backend/optimizer/plan/subselect.c        | 103 ++++++++++++++++++
 src/test/regress/expected/aggregates.out      |   6 +-
 src/test/regress/expected/groupingsets.out    |  20 ++--
 .../regress/expected/incremental_sort.out     |  16 ++-
 src/test/regress/expected/join.out            |  16 +--
 src/test/regress/expected/join_hash.out       |  58 +++++++---
 src/test/regress/expected/resultcache.out     |  37 +++++++
 src/test/regress/expected/rowsecurity.out     |  20 ++--
 src/test/regress/expected/select_parallel.out |  28 +++--
 src/test/regress/expected/subselect.out       |  20 ++--
 src/test/regress/sql/resultcache.sql          |   9 ++
 12 files changed, 297 insertions(+), 85 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ee2582cf65..f07b3f0194 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2134,22 +2134,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2930,10 +2933,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2944,8 +2950,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2955,11 +2961,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92ad54e41e..bd648f66b3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -137,6 +138,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -234,6 +303,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *rcpath;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			rcpath = create_resultcache_path(root,
+											 best_path->parent,
+											 best_path,
+											 param_exprs,
+											 operators,
+											 false,
+											 -1);
+			best_path = (Path *) rcpath;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1eb0f7346b..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 7c844c6e09..33befe0e7b 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -774,19 +774,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index a8cbfd9f5f..b3cf302af7 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1568,9 +1568,11 @@ from tenk1 t, generate_series(1, 1000);
                      ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                      ->  Function Scan on generate_series
                SubPlan 1
-                 ->  Index Only Scan using tenk1_unique1 on tenk1
-                       Index Cond: (unique1 = t.unique1)
-(11 rows)
+                 ->  Result Cache
+                       Cache Key: t.unique1
+                       ->  Index Only Scan using tenk1_unique1 on tenk1
+                             Index Cond: (unique1 = t.unique1)
+(13 rows)
 
 explain (costs off) select
   unique1,
@@ -1587,9 +1589,11 @@ order by 1, 2;
                ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                ->  Function Scan on generate_series
          SubPlan 1
-           ->  Index Only Scan using tenk1_unique1 on tenk1
-                 Index Cond: (unique1 = t.unique1)
-(10 rows)
+           ->  Result Cache
+                 Cache Key: t.unique1
+                 ->  Index Only Scan using tenk1_unique1 on tenk1
+                       Index Cond: (unique1 = t.unique1)
+(12 rows)
 
 -- Parallel sort but with expression not available until the upper rel.
 explain (costs off) select distinct sub.unique1, stringu1 || random()::text
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5e6b02cdd7..0cde696292 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2976,8 +2976,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2987,11 +2987,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..9f04684fcd 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_2.c * 5)
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_1.b * 5)
-(28 rows)
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_1.b * 5)
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
          SubPlan 2
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_2.c * 5)
-(28 rows)
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
index 205cbb82ab..7870102f0a 100644
--- a/src/test/regress/expected/resultcache.out
+++ b/src/test/regress/expected/resultcache.out
@@ -151,3 +151,40 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+                                explain_resultcache                                
+-----------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..a3caf95c8d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1168,9 +1172,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index c7986fb7fc..249f76cacc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -968,19 +968,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
index 2a84cf3845..bbd1bcd013 100644
--- a/src/test/regress/sql/resultcache.sql
+++ b/src/test/regress/sql/resultcache.sql
@@ -76,3 +76,12 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
-- 
2.17.0

#79

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Justin Pryzby (#78)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for having a look at this.

I've taken most of your suggestions. The things quoted below are just
the ones I didn't agree with or didn't understand.

On Thu, 28 Jan 2021 at 18:43, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote:

+typedef struct EstimationInfo
+{
+     int                     flags;                  /* Flags, as defined above to mark special
+                                                              * properties of the estimation. */

Maybe it should be a bits32 ?

I've changed this to uint32. There are a few examples in the code
base of bit flags using int. e.g PlannedStmt.jitFlags and
_mdfd_getseg()'s "behavior" parameter, there are also quite a few
using unsigned types.

(Also, according to Michael, some people preferred 0x01 to 1<<0)

I'd rather keep the (1 << 0). I think that it gets much easier to
read when we start using more significant bits. Granted the codebase
has lots of examples of each. I just picked the one I prefer. If
there's some consensus that we switch the bit-shifting to hex
constants for other bitflag defines then I'll change it.

I think "result cache nodes" should be added here:

doc/src/sgml/config.sgml- <para>
doc/src/sgml/config.sgml- Hash-based operations are generally more sensitive to memory
doc/src/sgml/config.sgml- availability than equivalent sort-based operations. The
doc/src/sgml/config.sgml- memory available for hash tables is computed by multiplying
doc/src/sgml/config.sgml- <varname>work_mem</varname> by
doc/src/sgml/config.sgml: <varname>hash_mem_multiplier</varname>. This makes it
doc/src/sgml/config.sgml- possible for hash-based operations to use an amount of memory
doc/src/sgml/config.sgml- that exceeds the usual <varname>work_mem</varname> base
doc/src/sgml/config.sgml- amount.
doc/src/sgml/config.sgml- </para>

I'd say it would be better to mention it in the previous paragraph.
I've done that. It now looks like:

Hash tables are used in hash joins, hash-based aggregation, result
cache nodes and hash-based processing of <literal>IN</literal>
subqueries.
</para>

Likely setops should be added to that list too, but not by this patch.

Language fixen follow:

+ * Initialize the hash table to empty.

as empty

Perhaps, but I've kept the "to empty" as it's used in
nodeRecursiveunion.c and nodeSetOp.c to do the same thing. If you
propose a patch that gets transaction to change those instances then
I'll switch this one too.

I'm just in the middle of considering some other changes to the patch
and will post an updated version once I'm done with that.

David

#80

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Konstantin Knizhnik (#76)

7 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

I tested the patched version of Postgres on JOBS benchmark:

https://github.com/gregrahn/join-order-benchmark

For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:

I set up my AMD 3990x machine here to run the join order benchmark. I
used a shared_buffers of 20GB so that all the data would fit in there.
work_mem was set to 256MB.

I used imdbpy2sql.py to parse the imdb database files and load the
data into PostgreSQL. This seemed to work okay apart from the
movie_info_idx table appeared to be missing. Many of the 113 join
order benchmark queries need this table. Without that table, only 71
of the queries can run. I've not yet investigated why the table was
not properly created and loaded.

I performed 5 different sets of tests using master at 9522085a, and
master with the attached series of patches applied.

Tests:
* Test 1 uses the standard setting of 100 for
default_statistics_target and has parallel query disabled.
* Test 2 again uses 100 for the default_statistics_target but enables
parallel query.
* Test 3 increases default_statistics_target to 10000 (then ANALYZE)
and disables parallel query.
* Test 4 as test 3 but with parallel query enabled.
* Test 5 changes the cost model for Result Cache so that instead of
using a result cache based on the estimated number of cache hits, the
costing is simplified to inject a Result Cache node to a parameterised
nested loop if the n_distinct estimate of the nested loop parameters
is less than half the row estimate of the outer plan.

I ran each query using pgbench for 20 seconds.

Test 1:

18 of the 71 queries used a Result Cache node. Overall the runtime of
those queries was reduced by 12.5% using v13 when compared to master.

Over each of the 71 queries, the total time to parse/plan/execute each
of the queries was reduced by 7.95%.

Test 2:

Again 18 queries used a Result Cache. The speedup was about 2.2% for
just those 18 and 2.1% over the 71 queries.

Test 3:

9 queries used a Result Cache. The speedup was 3.88% for those 9
queries and 0.79% over the 71 queries.

Test 4:

8 of the 71 queries used a Result Cache. The speedup was 4.61% over
those 8 queries and 4.53% over the 71 queries.

Test 5:

Saw 15 queries using a Result Cache node. These 15 ran 5.95% faster
than master and over all of the 71 queries, the benchmark was 0.32%
faster.

I see some of the queries do take quite a bit of effort for the query
planner due to the large number of joins. Some of the faster to
execute queries here took a little longer due to this.

The reason I increased the statistics targets to 10k was down to the
fact that I noticed that in test 2 that queries 15c and 15d became
slower. After checking the n_distinct estimate for the Result Cache
key column I found that the estimate was significantly out when
compared to the actual n_distinct. Manually correcting the n_distinct
caused the planner to move away from using a Result Cache for those
queries. However, I thought I'd check if increasing the statistics
targets allowed a better n_distinct estimate due to the larger number
of blocks being sampled. It did.

I've attached a spreadsheet with the results of each of the tests.

The attached file v13_costing_hacks.patch.txt is the quick and dirty
patch I put together to run test 5.

David

Attachments:

v13-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v13-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 74ded70089a5030f7f1e932a1f2dffbf7ecef6fa Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v13 1/5] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ce42ce3f1..43eca1f509 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3067,7 +3067,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..aaff28ac52 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index adf68d8790..81fb87500b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4792,7 +4794,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..86e26dad54 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1684,6 +1684,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 47ca4ddbb5..d37faee446 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v13-0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v13-0004-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From 45dd6bb50fd278d5a8580acc24736ab260d3f3f3 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v13 4/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 4ff8000003..4d6cd9ecfe 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

v13-0005-Use-a-Result-Cache-node-to-cache-results-from-su.patchtext/plain; charset=US-ASCII; name=v13-0005-Use-a-Result-Cache-node-to-cache-results-from-su.patchDownload

From 02e2ef549ff2ddd28bc50d75f08b71eba710be35 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Fri, 4 Dec 2020 00:39:48 +1300
Subject: [PATCH v13 5/5] Use a Result Cache node to cache results from
 subplans

---
 .../postgres_fdw/expected/postgres_fdw.out    |  49 +++++----
 src/backend/optimizer/plan/subselect.c        | 103 ++++++++++++++++++
 src/test/regress/expected/aggregates.out      |   6 +-
 src/test/regress/expected/groupingsets.out    |  20 ++--
 .../regress/expected/incremental_sort.out     |  16 ++-
 src/test/regress/expected/join.out            |  16 +--
 src/test/regress/expected/join_hash.out       |  58 +++++++---
 src/test/regress/expected/resultcache.out     |  37 +++++++
 src/test/regress/expected/rowsecurity.out     |  20 ++--
 src/test/regress/expected/select_parallel.out |  28 +++--
 src/test/regress/expected/subselect.out       |  20 ++--
 src/test/regress/sql/resultcache.sql          |   9 ++
 12 files changed, 297 insertions(+), 85 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 908b6cdc40..9dec821c05 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2123,22 +2123,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2919,10 +2922,13 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
                Relations: Aggregate on (public.ft2 t2)
                Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                 ->  Result Cache
+                       Output: ((count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10)))))
+                       Cache Key: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                       ->  Foreign Scan on public.ft1 t1
+                             Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -2933,8 +2939,8 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft
 -- Inner query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                                      QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                         
+------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique
    Output: ((SubPlan 1))
    ->  Sort
@@ -2944,11 +2950,14 @@ select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) fro
                Output: (SubPlan 1)
                Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
                SubPlan 1
-                 ->  Foreign Scan
+                 ->  Result Cache
                        Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Relations: Aggregate on (public.ft1 t1)
-                       Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+                       Cache Key: t2.c2, t2.c1
+                       ->  Foreign Scan
+                             Output: (count(t1.c1) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                             Relations: Aggregate on (public.ft1 t1)
+                             Remote SQL: SELECT count("C 1") FILTER (WHERE (($1::integer = 6) AND ($2::integer < 10))) FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(16 rows)
 
 select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92ad54e41e..bd648f66b3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -37,6 +37,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 
 
 typedef struct convert_testexpr_context
@@ -137,6 +138,74 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
 	*colcollation = InvalidOid;
 }
 
+
+/*
+ * outer_params_hashable
+ *		Determine if it's valid to use a ResultCache node to cache already
+ *		seen rows matching a given set of parameters instead of performing a
+ *		rescan of the subplan pointed to by 'subroot'.  If it's valid, check
+ *		if all parameters required by this query level can be hashed.  If so,
+ *		return true and set 'operators' to the list of hash equality operators
+ *		for the given parameters then populate 'param_exprs' with each
+ *		PARAM_EXEC parameter that the subplan requires the outer query to pass
+ *		it.  When hashing is not possible, false is returned and the two
+ *		output lists are unchanged.
+ */
+static bool
+outer_params_hashable(PlannerInfo *subroot, List *plan_params, List **operators,
+					  List **param_exprs)
+{
+	List	   *oplist = NIL;
+	List	   *exprlist = NIL;
+	ListCell   *lc;
+
+	/* Ensure we're not given a top-level query. */
+	Assert(subroot->parent_root != NULL);
+
+	/*
+	 * It's not valid to use a Result Cache node if there are any volatile
+	 * function in the subquery.  Caching could cause fewer evaluations of
+	 * volatile functions that have side-effects
+	 */
+	if (contain_volatile_functions((Node *) subroot->parse))
+		return false;
+
+	foreach(lc, plan_params)
+	{
+		PlannerParamItem *ppi = (PlannerParamItem *) lfirst(lc);
+		TypeCacheEntry *typentry;
+		Node	   *expr = ppi->item;
+		Param	   *param;
+
+		param = makeNode(Param);
+		param->paramkind = PARAM_EXEC;
+		param->paramid = ppi->paramId;
+		param->paramtype = exprType(expr);
+		param->paramtypmod = exprTypmod(expr);
+		param->paramcollid = exprCollation(expr);
+		param->location = -1;
+
+		typentry = lookup_type_cache(param->paramtype,
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(oplist);
+			list_free(exprlist);
+			return false;
+		}
+
+		oplist = lappend_oid(oplist, typentry->eq_opr);
+		exprlist = lappend(exprlist, param);
+	}
+
+	*operators = oplist;
+	*param_exprs = exprlist;
+
+	return true;				/* all params can be hashed */
+}
+
 /*
  * Convert a SubLink (as created by the parser) into a SubPlan.
  *
@@ -234,6 +303,40 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
 	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
 
+	/*
+	 * When enabled, for parameterized EXPR_SUBLINKS, we add a ResultCache to
+	 * the top of the subplan in order to cache previously looked up results
+	 * in the hope that they'll be needed again by a subsequent call.  At this
+	 * stage we don't have any details of how often we'll be called or with
+	 * which values we'll be called, so for now, we add the Result Cache
+	 * regardless. It may be useful if we can only do this when it seems
+	 * likely that we'll get some repeat lookups, i.e. cache hits.
+	 */
+	if (enable_resultcache && plan_params != NIL && subLinkType == EXPR_SUBLINK)
+	{
+		List	   *operators;
+		List	   *param_exprs;
+
+		/* Determine if all the subplan parameters can be hashed */
+		if (outer_params_hashable(subroot, plan_params, &operators, &param_exprs))
+		{
+			ResultCachePath *rcpath;
+
+			/*
+			 * Pass -1 for the number of calls since we don't have any ideas
+			 * what that'll be.
+			 */
+			rcpath = create_resultcache_path(root,
+											 best_path->parent,
+											 best_path,
+											 param_exprs,
+											 operators,
+											 false,
+											 -1);
+			best_path = (Path *) rcpath;
+		}
+	}
+
 	plan = create_plan(subroot, best_path);
 
 	/* And convert to SubPlan or InitPlan format. */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1eb0f7346b..cc4cac7bf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1004,12 +1004,14 @@ explain (costs off)
 -----------------------------------------------------------------------------------------
  Seq Scan on int4_tbl
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: int4_tbl.f1
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+           ->  Result
+(9 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 7c844c6e09..33befe0e7b 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -774,19 +774,21 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: "*VALUES*".column1
    ->  Values Scan on "*VALUES*"
          SubPlan 1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+           ->  Result Cache
+                 Cache Key: "*VALUES*".column1
+                 ->  Aggregate
+                       Group Key: ()
+                       Filter: "*VALUES*".column1
+                       ->  Result
+                             One-Time Filter: "*VALUES*".column1
+                             ->  Seq Scan on gstest2
+(12 rows)
 
 -- HAVING with GROUPING queries
 select ten, grouping(ten) from onek
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index a8cbfd9f5f..b3cf302af7 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1568,9 +1568,11 @@ from tenk1 t, generate_series(1, 1000);
                      ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                      ->  Function Scan on generate_series
                SubPlan 1
-                 ->  Index Only Scan using tenk1_unique1 on tenk1
-                       Index Cond: (unique1 = t.unique1)
-(11 rows)
+                 ->  Result Cache
+                       Cache Key: t.unique1
+                       ->  Index Only Scan using tenk1_unique1 on tenk1
+                             Index Cond: (unique1 = t.unique1)
+(13 rows)
 
 explain (costs off) select
   unique1,
@@ -1587,9 +1589,11 @@ order by 1, 2;
                ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                ->  Function Scan on generate_series
          SubPlan 1
-           ->  Index Only Scan using tenk1_unique1 on tenk1
-                 Index Cond: (unique1 = t.unique1)
-(10 rows)
+           ->  Result Cache
+                 Cache Key: t.unique1
+                 ->  Index Only Scan using tenk1_unique1 on tenk1
+                       Index Cond: (unique1 = t.unique1)
+(12 rows)
 
 -- Parallel sort but with expression not available until the upper rel.
 explain (costs off) select distinct sub.unique1, stringu1 || random()::text
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5e6b02cdd7..0cde696292 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2976,8 +2976,8 @@ select * from
 where
   1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1)
 order by 1,2;
-                        QUERY PLAN                         
------------------------------------------------------------
+                           QUERY PLAN                            
+-----------------------------------------------------------------
  Sort
    Sort Key: t1.q1, t1.q2
    ->  Hash Left Join
@@ -2987,11 +2987,13 @@ order by 1,2;
          ->  Hash
                ->  Seq Scan on int8_tbl t2
          SubPlan 1
-           ->  Limit
-                 ->  Result
-                       One-Time Filter: ((42) IS NOT NULL)
-                       ->  Seq Scan on int8_tbl t3
-(13 rows)
+           ->  Result Cache
+                 Cache Key: (42)
+                 ->  Limit
+                       ->  Result
+                             One-Time Filter: ((42) IS NOT NULL)
+                             ->  Seq Scan on int8_tbl t3
+(15 rows)
 
 select * from
   int8_tbl t1 left join
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 3a91c144a2..9f04684fcd 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -923,27 +923,42 @@ WHERE
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          Filter: ((SubPlan 4) < 50)
          SubPlan 4
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    ->  Hash
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          ->  Seq Scan on public.hjtest_2
                Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
                Filter: ((SubPlan 5) < 55)
                SubPlan 5
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_2.c * 5)
+                       Cache Key: hjtest_2.c
+                       ->  Result
+                             Output: (hjtest_2.c * 5)
          SubPlan 1
-           ->  Result
+           ->  Result Cache
                  Output: 1
-                 One-Time Filter: (hjtest_2.id = 1)
+                 Cache Key: hjtest_2.id
+                 ->  Result
+                       Output: 1
+                       One-Time Filter: (hjtest_2.id = 1)
          SubPlan 3
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_1.b * 5)
-(28 rows)
+           Cache Key: hjtest_1.b
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_1, hjtest_2
@@ -977,27 +992,42 @@ WHERE
          Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
          Filter: ((SubPlan 5) < 55)
          SubPlan 5
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_2.c * 5)
+                 Cache Key: hjtest_2.c
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
    ->  Hash
          Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
          ->  Seq Scan on public.hjtest_1
                Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
                Filter: ((SubPlan 4) < 50)
                SubPlan 4
-                 ->  Result
+                 ->  Result Cache
                        Output: (hjtest_1.b * 5)
+                       Cache Key: hjtest_1.b
+                       ->  Result
+                             Output: (hjtest_1.b * 5)
          SubPlan 2
-           ->  Result
+           ->  Result Cache
                  Output: (hjtest_1.b * 5)
+                 Cache Key: hjtest_1.b
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: 1
-           One-Time Filter: (hjtest_2.id = 1)
+           Cache Key: hjtest_2.id
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
    SubPlan 3
-     ->  Result
+     ->  Result Cache
            Output: (hjtest_2.c * 5)
-(28 rows)
+           Cache Key: hjtest_2.c
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+(43 rows)
 
 SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
 FROM hjtest_2, hjtest_1
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
index c8706110c3..2950b674bc 100644
--- a/src/test/regress/expected/resultcache.out
+++ b/src/test/regress/expected/resultcache.out
@@ -151,3 +151,40 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+                                explain_resultcache                                
+-----------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+   Recheck Cond: (unique1 < 1000)
+   Heap Blocks: exact=333
+   ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+         Index Cond: (unique1 < 1000)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=1000)
+           Cache Key: t1.twenty
+           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=20)
+                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
+                       Filter: (twenty = t1.twenty)
+                       Rows Removed by Filter: 9500
+(13 rows)
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
+   SubPlan 1
+     ->  Result Cache (actual rows=1 loops=10000)
+           Cache Key: t1.thousand
+           Hits: 9000  Misses: 1000  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+           ->  Aggregate (actual rows=1 loops=1000)
+                 ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t2 (actual rows=10 loops=1000)
+                       Index Cond: (thousand = t1.thousand)
+                       Heap Fetches: 0
+(9 rows)
+
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b9a58be7ad 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1477,18 +1477,20 @@ SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
 (3 rows)
 
 EXPLAIN (COSTS OFF) SELECT (SELECT x FROM s1 LIMIT 1) xx, * FROM s2 WHERE y like '%28%';
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Seq Scan on s2
    Filter: (((x % 2) = 0) AND (y ~~ '%28%'::text))
    SubPlan 2
-     ->  Limit
-           ->  Seq Scan on s1
-                 Filter: (hashed SubPlan 1)
-                 SubPlan 1
-                   ->  Seq Scan on s2 s2_1
-                         Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
-(9 rows)
+     ->  Result Cache
+           Cache Key: s2.x
+           ->  Limit
+                 ->  Seq Scan on s1
+                       Filter: (hashed SubPlan 1)
+                       SubPlan 1
+                         ->  Seq Scan on s2 s2_1
+                               Filter: (((x % 2) = 0) AND (y ~~ '%af%'::text))
+(11 rows)
 
 SET SESSION AUTHORIZATION regress_rls_alice;
 ALTER POLICY p2 ON s2 USING (x in (select a from s1 where b like '%d2%'));
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..a3caf95c8d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -148,14 +148,18 @@ explain (costs off)
                ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
                ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
    SubPlan 2
-     ->  Result
+     ->  Result Cache
+           Cache Key: max((SubPlan 1))
+           ->  Result
    SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+     ->  Result Cache
+           Cache Key: pa2.a
+           ->  Append
+                 ->  Seq Scan on part_pa_test_p1 pa1_1
+                       Filter: (a = pa2.a)
+                 ->  Seq Scan on part_pa_test_p2 pa1_2
+                       Filter: (a = pa2.a)
+(18 rows)
 
 drop table part_pa_test;
 -- test with leader participation disabled
@@ -1168,9 +1172,11 @@ SELECT 1 FROM tenk1_vw_sec
          Workers Planned: 4
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
    SubPlan 1
-     ->  Aggregate
-           ->  Seq Scan on int4_tbl
-                 Filter: (f1 < tenk1_vw_sec.unique1)
-(9 rows)
+     ->  Result Cache
+           Cache Key: tenk1_vw_sec.unique1
+           ->  Aggregate
+                 ->  Seq Scan on int4_tbl
+                       Filter: (f1 < tenk1_vw_sec.unique1)
+(11 rows)
 
 rollback;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index c7986fb7fc..249f76cacc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -968,19 +968,25 @@ explain (verbose, costs off)
 explain (verbose, costs off)
   select x, x from
     (select (select now() where y=y) as x from (values(1),(2)) v(y)) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Values Scan on "*VALUES*"
    Output: (SubPlan 1), (SubPlan 2)
    SubPlan 1
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
    SubPlan 2
-     ->  Result
+     ->  Result Cache
            Output: now()
-           One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
-(10 rows)
+           Cache Key: "*VALUES*".column1
+           ->  Result
+                 Output: now()
+                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
+(16 rows)
 
 explain (verbose, costs off)
   select x, x from
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
index b352f21ba1..edbddfb1b4 100644
--- a/src/test/regress/sql/resultcache.sql
+++ b/src/test/regress/sql/resultcache.sql
@@ -76,3 +76,12 @@ WHERE t1.unique1 < 1000;', false);
 RESET min_parallel_table_scan_size;
 RESET parallel_setup_cost;
 RESET parallel_tuple_cost;
+
+-- Ensure we get the expected plan with sub plans.
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.twenty = t1.twenty)
+FROM tenk1 t1 WHERE t1.unique1 < 1000;', false);
+
+SELECT explain_resultcache('
+SELECT unique1, (SELECT count(*) FROM tenk1 t2 WHERE t2.thousand = t1.thousand)
+FROM tenk1 t1;', false);
-- 
2.27.0

v13-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v13-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From 93f9534a457cec72b5d0bcd5dcb7a72a8317c3f0 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v13 2/5] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v13-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v13-0003-Add-Result-Cache-executor-node.patchDownload

From 6b8e8471a9d6ed71c1b85c8946c3736ddba911fd Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v13 3/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  232 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2758 insertions(+), 176 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b09dce63f5..908b6cdc40 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1584,6 +1584,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1610,6 +1611,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 319c15d635..b3e89a7af1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -490,10 +490,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e17cdcc816..999ff9028e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1736,8 +1736,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4857,6 +4858,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..2cf2bc3712 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1280,6 +1282,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1971,6 +1976,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3044,6 +3053,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..41506c4e13 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..3e0508a1f4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4ff8000003
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..15a6a4e19e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4980,6 +5007,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..2ce54a526a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1907,6 +1922,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3861,6 +3891,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4095,6 +4128,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4388aae71d..c58325e1fd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2191,6 +2191,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2877,6 +2897,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..41725baabc 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4027,6 +4027,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aaff28ac52..38d6ee11f5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..5d23a3f7d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,198 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1674,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1683,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1853,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1878,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..a564c0e9d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1510,6 +1523,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6344,6 +6407,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6930,6 +7015,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6975,6 +7061,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..9584cdb653 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..92ad54e41e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 86e26dad54..3229f85978 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1547,6 +1547,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3847,6 +3897,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4065,6 +4126,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..07e5698a82 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index bd57e917e1..93ffb68c7a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 758c3ca097..344ec8b84e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c94a..cb1a4fd845 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1974,6 +1975,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..4ef182e3ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..31931dfd8a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..5f0c408007 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1dd12d484e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..e9b4571426 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -78,6 +78,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0e1ef71dd..fd0de3199a 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -114,7 +114,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 081fce32e7..285de3e2c0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -200,6 +200,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

Result_cache_v13_vs_master.odsapplication/vnd.oasis.opendocument.spreadsheet; name=Result_cache_v13_vs_master.odsDownload

PK�5CR�l9�..mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK�5CRConfigurations2/popupmenu/PK�5CRConfigurations2/progressbar/PK�5CRConfigurations2/menubar/PK�5CRConfigurations2/statusbar/PK�5CRConfigurations2/toolbar/PK�5CRConfigurations2/accelerator/PK�5CRConfigurations2/floater/PK�5CRConfigurations2/images/Bitmaps/PK�5CRConfigurations2/toolpanel/PK�5CR
styles.xml�Zmo�6��_!����)�}������h��M1l_
��d��(�t�����D�zs�v�)AS�������}z~���sAXqN'q����Y���et���;eiJ^&mr\�H�[�E����:��
/�
"���XJ�d%.���G/�PU�Q6V��}i�o�Xa�m����#�/�p�+���S_<ec�o�R!��P��7���������v;��'�g`�X,��u#�+7�T��X&�t2�c	�����I�&��|45P�NTK���(wu^�S��4��:�]���h
��<3�f�����2O|���@|O�[�i�y��.�x>v,�mP�8)G�Y�}y��3UT�n����SP�{��N����G;�R�gyi
7
�k���������K��3$?�)vf�T�2����{-4�I�U���*[U4�5��'��l7�`@���S`�q%����ly����m��*��|SbNt�Fl���ge{���OCcb%��js.��a,�ED
�d�\z���b.�Bq���H�*j�����Y��+g����B��#*V�������]y�!j�4����i�y�bsBo���a���m`�
�Z �p��|8�a�@�D"5�\CNLv����B��M�F�&�D�����=����
��G�Ng��2���d��n1���=�>�4��b�%\�,���u<vt0Tvu{��V&8�Z�j���$3cFSZx	9�8,�Q��
�%Q[��K��VF	zW<�M���%z��
C
,eK�x3������(�������"R�,A7���^��Z������S��(�i���t�k����^_�����:����������8l���-'���3%���H�H7���Q��{eOx����e�;t��F�7/X������.�c:<?��Xst� �h�������P�IwD'p���Q3�C�3�D�:�<��c*��-�g�`���^[u\-��Yb���dk�O�b4y�_g���uA�%��QL����������^����l���������_*k��`X��x/��(���h�v����=zd	L���]a�
H�+I��)p�O�n���2��{�x�BU\ %I��$�?����y:A��������=�WM���:��J�V�Q]�1]s?(`�N�~�Fl1&c [��{��l1d����Pn��*�O���t�f�S��S�q��y�|z�R����=��H�'�1�,G'�Wx�,�t�$��U����
y��N�M�7���3~���9&��yE����� �<�:G������P�<0����5G�����mG����N������J�|'m�c�����h]���z��CIPd;�Q��7l#���e>
{@���!�ZP��%J��H^�s�N�����-U'������Z��za.M+����8U�q��W�V��R�.��=�!G�Q��k`���D�f�����������P��:%L��k�����Y�c����$W4uR�����C�&Y��3����
���[u����������2uj�S���
�:???����l��
�bs�8SHE��!���/�K������WS;���1��j������>j��_�����l��qFX��%�j���I"�������d
�eE���|��_�/�����kH7�J<�M�x��p����qm��~
����8^�_gt_5��z�e���0k�X�U��r�W?��*��?PK*Co��'PK�5CRmanifest.rdf���n�0��<�e��@/r(��j��5�X/������VQ�������F3�����a�����T4c)%�Hh��+:�.���:���+��j���*�wn*9_��-7l���(x��<O�"��8qH���	�Bi��|9��	fWQt���y� =��:���
a�R��� ��@�	L��t��NK�3��Q9�����`����<`�+�������^����\��|�hz�czu����#�`�2�O��;y���.�����vDl@��g�����UG�PK��h��PK�5CRmeta.xml��K��0������Ml�y�Jrw�$�`W���R������'����Q6��N�OR?�����SF7�&E���J�e�6��S��6�����	��O��h�j��V�����r\w'p�n������|1Z+/��?���3�x�dd��GL7�
^������/�z��	����o����d����z�K	�������4��2<%�|?+_��2�����
�;�h�%������'1�)IiL��������$-�z�������K��nxV�,O*Zi�VW��O���`�w�����R����q/�D���/�R_�#h���������ex\$$aI�����3�����Yt������qu�*��<�EA*`]�rVF3���� ��JC�?~���#v~��y%����}�0��
��Z���F��]�f?'	
�p[��-�GL�PK�4���oPK�5CR���J11Thumbnails/thumbnail.png�PNG


IHDR��	�VPLTE


%+'!('8#'%3
5"#1 -"6%''('-4,2:5&";3+789,A4G(7H-<S5:B6>P-@W<AG7FX=Nf<PfB-E1D*#H7)C;6S<-U>>B?AJB9XB-WH<cM7hU>GFFFKTLRYUJFTNRZRKXVXBNaIVeI\tX[dU]p_`_NbuYbjYfwcSCaUJcYMjR@lWHkXEmZKd\Wq_Mb]co`Oib[xbMug[dbcbdkcimjdcjfimhdjjldjtotokqxukerorzqiyzz\m�]p�]r�an�gu�j}�u|�s~�m��z��u��z��|���m[�r]�ng�vj�|w�|j�q�}���{��m��t��y��}���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������tL�-�IDATx��}
T[���;���N�i�����N��i��������q���N4A1D?��	�H,,F��,,ld[X���mR70��P�f8"b������HQ#t"����}W�4N������g�Q���������}��w�U�U���/�����)!����_���������x[���,���Y�����T����F�iu�L����_�^���k&�gwi�N�����o�o}�w����������������r�?���Q����!K�� ��s��]�re>�����3}�_X$��,n����%�w��sj�����w`���^��V�>�����*_��A���i��
��$�x�/o|�_>B���S.��Y�;����\c���;��b m
^��p9.�5�������i�)�T�Q��5��o�k�Ian7�t��w~�M_�'_��p�]�`����};E�����?�9�����?%ya��~o����w���*���_�];���};��=���{za�^��7L�]�W��������E?��������.�����|�mI�}��t����f�w�������������z?�����������+�Oz+�.}%j~�9�=��EZ'�yB���H�Wcp���o��x��o2����?��z�.����)h{��_d2�}GN���]
��_��K_��
qu��Z�o�9.���W����e8�g�[������=���-��|��C��{�{������g�{�]�Vw��vuu���g_��e�w�T�S
���7�o���le��_�������]����o{t�t���~�'����}g?���`�O��:�yd���v��?+�"p���m/<�A��'����?9����d�V���������@��!���a�c��z�����~�f#�!����������{����{����^�����O���1���o����caWm���������������������V�U���o"����=����/��N���j�Wg>���!L����;�0����B�o,MeF���o������.DF��92A�>22>���ix-���h�A���h:4r����.#=�����'>2�1��D��}!8
0��09��7f�io�N���u
N'~t�+|�&�n�5*322
�D��2���]��4�g���/�-m�2��w���nPW�9>b��H�+�� ����1�t�)_�~��E!�j��:t�$
�m��.J7�cf�������+������6;[��m@N[�O��RZ84�6uHc5'��j����JD�x7[)���t���J��u�T��li�wSw���Q��n[#���I��*�"}Xa���.A��g�<�4�a�K�0x����G`����5=��`�?���#y0��=Fz3����������g_U�`YP�~;�.o���t���n.�]v�l�p�z<u�Q8b$�#��9�;`������b>��CP�*m��L�mb����6�'R�}�h���M���1\	��<�l7:]�r���*o���������i8Dv��c�"j!�BR�`Q]�i��j����R��$g�Hr�N��]~W��dx�f,#������d�#a�h*M��G��	��4�R�D�����bF���!��A��}�V�����^���P���ZW8o!�ezb�E)�6�(����Z)��0�z���D�Z�;U�n��"�squWz�������Dr�o�i��h�`�)�,��sU$���a�&��_�o���������������
����
t]L#�#F;\bv�"� ����l�����D��{����9d�Q��1�i'�E(��#��t��g�
�D�y7R�G���
�yp�l��}tj�TPs�+Fs)�V��\����0���[/o4�[�c�VM��ygW�U�Z5g ���:���������*9����>)����Z�����;��K�DM������Ph-h��,d���1�V��8�4$���SX���s@���]�}1�}Q��G���
y2F����wO����7�������/M�p%LK��T@
����,�7���aq�B��>�!�	)Dz�qq�%}�u3p��(��S�vN��q>Gh���F���@h{�p����`�V��#���	�y��HT�,��T6dn�5��w�:��7r�o?�I��x�#�m����A�A8����h�=��z�M��*L�$��aw�>��n m����Q�@�3I�T�nWW��:���>hCQ��h�7Xs��
�9�#��
���]],���;x����B�.o	�wpu�J���D(^8Uv,�Fb����8���
����!j��77��t���5�k��p:��x
x�����j��6�����B�4m�`�
�N���|f��!�|
�q-���249���L~X��������6���g�]�	[����MV*���	��hmc]R��a��:�"�8~����p����GU��@�RQ�6��J|�����)����K8��&�+O����2O��(0=2G=�9�e�6f{����N��j�d���]����O�
�����==�+Z�OU�6\�
�_c�����d�O}��-vb�{�[%`�����$~���T6N��bpo�a����!M�-i��I�Dq�����R��[>�yf��A#V5�"����}�qvu��p����P�x���MF��R2������v�59�y0����Z����{�q��4�T{���o����.M�������"�#��!�H��h�l��'	�8L��X:�����"�yW��������F�e�"}�s�g\��x1�H�ZC�����p�R3�
l�_ZF��+�3)F�'^p\�"�1i���
i�W��������n��&�4�����G������0d����?
�f^PK|�+�� X�R�mFx2�����yA���:K��p�w���Q���}9,�����z���=%V�ep	3c�At���Dx��n��^��	��:�X>���\���n�'<��\/e��ykkY5(��������] ^��9!@6�1�'GT����&�[GM��j����?>��><q}��xM9��Q8V�
=��0��g<5����V��A����ub<�uV:��q�����&�v�n�/�c�?��j
�G#<�6�b6�lJ	���MX��x�Vg�����-���}+4c�)���Jn���G���7:h��	����N4�3��+Z�
t\l�M���k�^1�P���6#��_|c��H�+���f�#��������&f�N"�jT$����%�M�M��gD�mM/�����4�K]���h�B��j�_����S�>����d��P��0�S��<��!)(�P8�F@�7�����f9���C�����������2�2<��2�p���R�L�M)����	c��w����$�~l�gP���vW��o�.���)4����P���s ��aX5ik������KWx�}<s~q��7>�w��\Y�y�~���BJ�fRVn��xs���O�-IR�O)�D��L����l�1�\�� ��C$�v5����X�9*���"m1�B3�3����,�������>XH�������D�{T+��DCK���Kn#K�z����%����i�X��z1\;Z��������U�qwa�&(xe�59�]����g����	����^/C����1�����O�7�u~��7�*��Y(
��@K)�]�
��;D�Z��4���2���W�b����2�`n��#y�~w
�g#o:�<��7�4��X�md&�%cc�%���g3o*�8[���\���	�j�8?�&�H`3�m�7��yM�yxdgqUG�v"f:I��`�3�o�u��+z�v!��so��(���:>���	m����������]�~�B�)X�k��\Mf�-
Q��mF��	f�m���m!t�9��fG�n���
*�N���y����!E��|D�y,'�����H���.Ru�d�uIC��"��E��<��t.��[�	��e|��S:����n@P�b�IR�Mb�s����=�v�u��P����#L����/��g�1@��p�Gw�P134M�k8���Roy�Gdb�5H5�:��o�5��E9\PU�Z��J��:�?{��;�R������oj��)�g��M��,-	�1�}r�/�������/���?`hP�������}�o$��]]����a��M�W�q.I������j�z8T��m����9|i��s���AMS�������b>���<g�A��
��ZtmE;�j�W�?QM��)Nv��-R�`��2���u�;F��]�������a�����hN���s���}�G(�#������50yIHC�U���D
���n;���0�����ND�s>B�
:E�W�&�Uj~<���"l8;��Y�O�����~wo]z�N��C�Ubw^�0��0��%���!h;	q1�\M��-Pc�����*}F������m��+���fp�2�p���v���������RP���k�%��nh���)22�0X
�&����� V�	{T&+�����\O
9�������R�;�iV��,b��[�A�������OE�w�Q�Vv�VZ���K������J����y\2�2R��=��R&s���m[�g$��R�b��ZO�
ba�������[�+p�	����n�M
���<j:�Kg����}�r�	_��_`�#�y���Yc�Er�CL7g
7T�L@-�
p�E�J_pJ3����m|���w����6�� ��c�M-�Z���,�f{��K��	p���������Lv2/=����K�_-g>"����$-��b��EUB�����M��R��9l�KA[6��[J� ���d~>t���=��(��R;����G45j���E����]�cIi���������j�	���u7�u"���exr�-W{��U��kF���{;�8�����W:m���1h��������������}�,kz�h������#hM����j���Ae�s��cE�.s�C�7{��9���B���*��L{���}��p|26&�����!���p������R��_��
���!"�����G����s p�b�Ly���B^���5��N���1�*��4T��F>��z��]��+3��o'y�F\e��;������b!A@���F���M��K�e�`�kO�UC
�#2��<�����nG�Al��~�
��r�������_\B�%RH��3��K�;�FbS!-6�pj����{#��7n9����F��x�`7��LR�8�;QH�er��%��D:Y���D{����z���JP������j]_b_� ��w��Hq�,����IE�Y�|@�;+��p����{�����-��z��b��>"N�����\�����)�`!�
���y���5��
]I���je�I�]�2.�7(���fH�K!�U�$���1��z�E���!�PT������x����x�.��1�H�MH]?
�Q9L����Z?.��Je|7��z�v�"z]����6nhF�K��nE�vHhx�G�i�@�2W���La���y�r
�����h�mF�o%����H?�x5Y8�Y���Je��Q��fR�����.*����~�I��a�Y�2�� �-�f��:������=}��l�����D%7�"w�-|y^~P�q@�3*��vo�a��<;TxxA�<v����8�
���!��^�j�������xS
�r�
�6LVg�]���m{��`[4��w�Y�B�M��P������W>n�H3��������}�o�7��Nw��O*�����"�-A^�~������>��W:]����f���?�O0��O�u���p�6n@t##bUR����h=����_�_P	�Tf&6�hP��5�#�j=<�������\��V-EA9��u���}�9�u�>�#��%13��E��g}D��F�
j<��^��x)��M	^�3�D��G�k���0V�t^>&95g��-���WA������S5�V�:n�sK	c>����� :���.�8��"��
V+�^�u`Q4�lU����
j�:
��Lzc����Ti`�4�����C:X�O5�)0�Y��]>�D�����Y�l�L.����K���)�T�d������d���A�Lg�(F
Le�P���[�E�����Sv���~�
k������z�	�	�j��Wa�^�V����;Q�Zr�>�a���9��S�����.��������,�1����q�t����T(�<��K�i������,V�����������-�OK%����)���k��#7��2NE�+n�7t1R��vb��d�Z�����mJ��sX��A������l�e�Z�wL��N�LE#l��,���n����P)=Fp�lvv���;��k4��w]���$r�j:	�������^;����Q��[���8����O��"n�7�K��r����0\K<��B�ek���,j��raI��X��/U�4-�����%�SDcq����EO����PR�4���@{���&���P��CYXB��y'������b�Y�p����XM
������N>���{�<�L��ad]��$���j����@k�I�1��,��L����J
#�\+�{�MH[`E\m-z4�`��EC�b��Y,|8_��FM.����W:'��^�
�y�`zw#��&U��|�����shL>}�[3��j%C_>�8��(cBQr��� %�Pzy�Y.�����j>:�L�����X��lN
��9���Bo�,8������2A�;���n���=��5u�6����d������4��<���&���n9R�8M��B[�od"^���������y5���4�wq$�iP^�,�i��d]��0��4���2��p@��u�H����v�5+�y�B3���x�w�p����z��B$`�����������%2���N���ff���a������Y<0�>e�����vt�}�f���������G]k��Bk��f��V�m���|+�J>X��j����*4�����+�s �[���<�+5d��#���`~-���]xjN����6&gA4?h�'�e�����������u�Z��A�k���q�.�V�I� i�0����!L,�%����������jFse0�H�@��#K|��0����+��=ld�&B�����I�U�4{R~#��j`�9~�����[I>D���U"��x��%��J4[��8���0}�u9u����&���^|Y�����=`���������:����m�G)qke�S@���G*����Y����O����@��SY���O����������p4�7�p�y3�e0�a&�c�
�O�4�w�q&�0�[�����
�T:��*������ ��.����v��]9T�!�GN��f9���~D3��R���p�eM�;{��"A����p}X� ��p�=
��� XNb��*p���t��R�)���T�q�X�#���]����U���z�.M�9����n\�E�d1�9��(�s���S��&D�f�l���i�Cqr3���������Y"?X��_��7+��'�=l�X*�eK�	c�g�i��������[�m�kZ������>��Yi�!
������+����D�V�2����<n���,�|,��G���;*"���f�/�Pv�n��}�#a�ft�<���	���Z������0`O��1�ty Tm�5T+n5���"^���{ c�r�~��<�SM��P�&?H�	?M�k�44��M�E���DZ�A����a��/[H"Y��k�8o������jd���\������1L��`�9����+PHr���|����I����0��^��_:��������K5���%3���.����lU�z��}j�Z,�pqS�7L����+�`��w����#��H�� j��
�t9/���
,�3?
8A��e�Y��%���0�h���/��q���E\Y��7h����t=���G{!%�`/�
I��T��k�:����=��<������W��t�B3������.s�A�����q��[ST~g�S4�����n;f�.	�f��t�o�Aq�X�K�Ek�k��%)��E:��Y��S+Y�����ktuq��B�r=��L���Y�k���/��<��xU��]�$���)�A}F[�]U�r�?�.DD�2S�?��h�-�ms�+ ��ax���������|�����>�v�C�h���@'iSAE�{����I^�q�f�a��5��rc,HX���7�l��k��d�AF2�s�
�}Y�M�;�������]/B%'�����|���4�W�j��P��e��T��	�a��Tv�A�9<}pE?�K�o/�e���f�L3M/���q��������%�<�8��)�m�	��AXe��H��@�����N��;N�=i���S�m�Mu�m��	�f��y�0	��0�/����1^2�D�^��|<�H2�D��,�VO�??�������L�����0�$�����p��s�f����B��sz�������+4����{O�a|���Y�u�H�:��p�W�=QA]�� �W��������
�P6$�OMWr����D]�\��;����k�Ki���#
������9-�%��Sh7�7
���������K/V���r��U��W������:R[m�;��P��Oj�.m�a#�j^2sQ)G[zmo�r>|�������2{��~������*���jnq��-������g��h�m���o�d/�W-�NS���Ztu-�j��b��n>�;�H������R������8��`q��Z�~�-�N��#�.�j���o[��>��3T��R�h���R`K
 �>p@2�^)�;D�s
:���t��9YUn� �7�!;�E����@������D"��r�����m���d3���e!����W���H5������T�����YU$
���3_�+i��A��"�:�a9<���;�2��Y		�V'�!n�:g^�W�:�G �k��
G��)W���"V}p����}
$zl:f���(+�a�V��[E��y0+�3����yB{�d�<x��!qq�V���s��H�])�~�*�`�1 �f�E\3o�gc�}���-�AqY���d"^����U�����rBv�� V!�I��s�m�#�MN����,����m*�%����������Ia�F`���>a3;�y��)�(9�ek�y�L��[|e�����-^����S�&�Q����5� a����Ju�	�_['a���o�X�6��B���M�����e�3���������d��G(��'�B�SSTj<��������T�e)�����4=�R�
s������nX������������O��*<�rw��������v�.�+�?#��H�t�������"���1��0�"�
�s[Z�cC�_�7
��7yUtG@�yH����(�	X��4�`Q�9H�/�����4S���f������~�F/up\Z/�����J�c�I�����J�7��g�����!�H����0f��n05���_�PU�8��0�%cG9.�G�#�|�X�$\sa��n��!T��7a:��.�6R���1\Z���
�����.�{[�ND�f�7���v.�k!����Jb0\NBV(
�i������2��7�w�&�T��&�����3���uc����-���2�*/������_�C���0�h�k�u�z��E��X;Z�����&x�������K�gc�����)�D�G�RE6�-��bx��Z�wIi=���w#+���\����� 3Ee��1����K����[:���O�\
�VZ��N��}i�dQ��gm����A��Z>��
�h��S�$dO���������0�D�#}��S���*�'@����z\o�Z��>z2B��l_���,L�������M��v;�US�k�p�ow��.�U��X��Mc���3���Q�lv�{��@T/�Y�
�����V\���xi\|]>���m��`W��13�pB���O�7���;r*�����w��xC��%������MY��9�2�U��������2\=pqk66�R/;�V�T�pZZ/Y�P�X\k�9��	�u`RJ��:�C�g�x�T�-����nM��!��i\Zo�>�(�������/ZZ������*7���/�1�q�Z�'��/�$�����/�����q�p���n���,�'���??^���+���>�M<���Z����9U���/"��[it�U�*�s�x�+����KY��O�H1���������K\��*.�����OCQ�o��H$
�\
	x@@
>_���v'��k��,�)4��N�����T>`D���X��e8����~isx�����R�����&���Nz����lg�$k#�����|�JR���-�v��*`6j��o?���e��4�S�O�C�\'p��nz��)�X��p�<���&S�~\���X��8$���tk��u�_��))qb����oJ0�j��������t�eS����������#��GUC������dFv��[f�b�ei��R�q�}k"�r���:Fg��+����D&�O����#8�8������7b�����LCb��c������P_F����qk�q4��gKA�:����2m�@�f���#jY�+p�H�&^�ko_D9Jy���Y8���J������jk�j;Y�l@�:�������G�������f�
��Z�����N=	q��V7�U�[��
n�~�u���5��U�j���T��A��WE��Ok6�K�i�����W�j�yD�6�N�A���h7�J�b�vpH�jM�5g���^4"��Q�.k>�Xr�KLL�7Q8��;���L|�B�j���@��V���5{}���P��������p����[p_M
7��L�mF�8�������
7�Z�x:�����m��v�c
�
����9�i��.�'&)�~������X��r����{�G������n�bVCv$��Dw��$��%	���J�N�P "��LF#�Lj6��m�v�T����]��cUS��Y\x{h'���
����,|�9#!qp�w����zs��Z�4:2��#���=����(@�X��9���l�G�
�:�;2+tT�R3�Ev��� ���q�`��}����85���(���D]�+�$����n�(�IcU�x��'�<����(�]��+�T5%d�c<�����4������z���U�<�}�,��<��_�v�=�5��������R�nn��MF\�p�[�u��K)�usk�1-����RP�1n*��}9�S��7|�Zz,=3��N�G�)��������}'M-���,�ly�������T�
��xhR���n�;��F�Gr)Qg��D&��h2t�1������Yt��9���\����v*nO�c���q����+�}f��������v��Qx5�K���@�X��N��\��X�+p�{�_�cV�C���IEND�B`�PK�5CRsettings.xml�[mS�6��_����W8���@�(���$���M�7�Y�Jr^��W�������>ebI��W���^|�q�3k	BR��v��e[�]�(���w�����q��	�f��C7�����ti��\:I��
� �T:�� �:�
s��vba��5����^(8��j�:XuP���~���[w]]�3:/**��T">
�$���:�V�������OL��;;�^p���4�?���}�vjk������jv���c���Sg�{��6�n�\���I�%����a��A��zj����z���/����w��GGE�>	�{�/-V�����%6E4����RS*�]�D�~��hJ�	�����! U���� �����%QE�r
��%U:*�����r����,����y��~G��������,P��� u�BOi��P���T���\E�������S|�o��l;��4{�V��w����Y��r����F���#�{��4j���6 s�2���eS�x��_M'�)"���!��jc�7���������������&s�e<|��^c���H��{TL%�Q�JB�0D����S�<7L�~���ZO<TI��h�C����	C�&��D��C�&���\Pl��RD�AUC%�"�Q��a�x���>6�a�m�q�}NaI�tY�8�_����0�ah�m�s�����x�8���s�_?��;����{-���g}oG�� ���%���o 0�_VU�WTuA��
"�!2)h�Ql�;G�N���^��y��*�Dj�C��pu	�Q���@�T���	���L�!M���]�R�,*-t�������O{D��sLpD���WU! ����)�M��v��%�N�%�]>s��/�8�ztIe���g+_�u��5����Qr�=�K�����bv	�����A(H�Mo��Zon����r��/��~&)��!wUH2��p�O`�k���(��>)���3
C��h��D*��L�Y
�A�6w�'�H���B[�W�'{��YC��0��6%�z���)5\A\
6D? #��|�`L�p�|8s��e��������&�`�����N����j $�Hgx��L���@���&j������g
�r�KQo,}�����c�f�gh�PK�L���6PK�5CRMETA-INF/manifest.xml�S�n� ��+"�S`�iBM{��/�>��"��T���DM�i��H���������w�	b����Y�zm�k�����d��j�+��)��=L��a9��*�$Q��$��@������/G�[6�fWh��<��NN@�g��J����J�Io�mA�F�����`������]����A��a����	�����a*g�Q�8���|>:O!���@L,���hl������I	)#/�ly;GXF��� 
@h��+��"���Gm���t�-�R/~8���uI�������P_�RF��y��D�O?��������PKZC)2,PK�5CRcontent.xml��mo��-���
�7'H����ov��9�
'�-� 8v�-[x(R��������3�L���[]�4�����a���5��������//���_^\]~������������?����~?�_���z������gW��_�/o�_]��?��O_>��������_�/_�^���|�j����o�?>�������?�������o�?\�?�������~���]����p������?�������_��������+�}����>�����W?~�������\]�RJ����5������������b_~��1x�b_�ovR�
�M�._����Z|kv7�{O�����!��b�.�����_�}-f�w_7n��7�k1����R����B�������o�7>�<������#��_JW���V�_�x%�����_]]��Z~�.���h-?�������~q��~~���������zy��e<��G��
�oQ�������[��Y���������������~���p�����\���������������y.w��i��m����h��������g'��z��H^���{������o�y>��G����g����]��q������}?=�z}����z3��~��~Q���8���?��������N{�
?q�/�����H'/su���!�!�����o��O��/�-�\!����w�xg��ts��>��q������/��7�.�GXc���o��<����������/�}���_>�/v��__�����}�?�����'^�/_\���G����:|rX����O.^~������������'�W/n��K�nw�����r�v��m'0������mL�|w�����E����_���f���~��������������/�7�Nrc���/�__����K���N�y�������r������[�n����M�E7/��s�������Wp��+�n:F�_^~����_|�*�������������������������_�^q���Y	j�J/_�����
��m
`�m��l�hR�����3E;��{g���}��wx�\�3
��66�l���������w�N��N��}g5�:�{gor�{�{w�oB���w�n����������u����y��[�����w�_y������o�/��&����B�����a���������Go����������i��/�����������G�'>�?���������<�l��������=���������z��tq���_�M�'������k�����[�>���������I����Cvm��]����N|s�����5�:�g/�~q������e�������o���owm��[nsq�E{��.'�����z��w�}���_X����g?�&�o<�^�.��.���/�������S-������/�������'}�_������;����z����-���xv�f�������f��8@<:�o��[W���|������������w��P+�8�����
����.���������w��?z���������n���*{����������fW�^+���P\�O.�x���6u�;��)p��?_�I
��_�fY���9��5�����������;����]+w���n~x��~���K�����������w^������~�x�����������/���$�h�����=��wg�K�x�;�f���`����]�8\]��dy8��x�����e��;��0�b���'��9|{�wOO���W�����O?�.�Hx����v,XC�gC>�Y�#,+L~�~�����?��y����?����{�n?F�w��&:GhG��q������G�s���.���C�W����Y	��/��K��.�I�d4�X-����E�����W:�����"�d��'j��S|M���6_��+XC�2�Z�J|�P8 �k���+����;�c��l�C�-�p���3E�o0
�Z�J��7V��3���lu��9��|
9$��Uv���h7�VR�^�X'c���2��$4����W���k�	��������'���M���b�
�J�M��U2c�e������p�k�036�������8_�W�ox�dnm�*���p�,�|�G���x q@�[�H|W<�������` ����d��y#�����tdk9��A�S+�n�
�m���b4�7
;�{$)P-3�A���;��`�t�t�2�R�{-��p����������
p��Y�EV��@&
��Be,x��b�[��E�����+����W8�����
2
V�����
���j�Y^>2zg���	5�B�v�^���t��t+�B>� �W��V�F�3�Gg���#���m�z����R�jfX+��+t
[�r�Y
H��|^�]>y�lU���3A�cW��)y�dr.3eOamU�`DBc�|�����m�X���W�c�!���jf��#t
[�~Ce���]���"%Cg�sE������c[�J� �qAh����������(�)E�Z��I�����}�4_�2Y �`�akX����V
F�4����{�4�K���</�d��
��1���~�Nal��`DLm�c%������m3VS�.E����*X�G�P)Y�3x�UT�nQ��, �h�8"���m�� j���k%Yp���L�1old+x
k���#�Z��WP���?�"�h�������U���r�jB&�;��K���]�o���wa]�u�N^��Z�Oe,����16���b�1+z
s�
�*X��|����1rO��5�7BxHbXo�j�x��kb+�\�4wsOYUh���{�l^
9�	;'|I�~Xc"i���&������U��)ec%�y_�SD���[�+��b�?�MI��x\R��� ��V���U22���p�*y8��Q��)�X���=i�u�9)��9�N��*X��P
���GV��V%����i-C�=bb+7����;�m�l$��9R�&OvW����0QxZ������y8�����H0��t~r��Y���!`�]����!d�"�����r�b^oRS��Q�BQs�l�'�>pO�����B��!3o��IX����L��Eiqs��-U��F��`�9��G���'��W���<��}6$��X�j"B���+z
u�bH#�!"�s%�I3F���}An����	�Z������j���P�SC#i��%FG�$,u��d��s>4+���(�I�j�L�f�����������PK�2Q��9`�*l��o[:��:Bt��YXK�O���S��W��V��d������s�T��h�7G�!��]��9f�(<2^�J�`r���s�<eET��Fj��J�	|�Z�
���z<,��t��[ywW
���G��)��D�yOYU��2��PL.F�i�1m�h4[pA2Nx�V�Z���C*t
]��H�#:�.���'�G,����b �����q�*hH8�`�N�l�i������������<���-}}�9nW�b�������
V2���#��\�SVD��[zcK�
��om�O�����1�"4��A�������4�HK-��k����zft���P}q��4�9���k��C ��� ��]�J�X�H��r��B��g����} O��oR���7~�-=@�7]���7�
���.�(|u.�)���#�TA�O�Smi�R�f�:`�����akI9yQ��)���#���!{�`������F���=�f����
����`Sb�����<�����Az)G1����'�#m��2h�P
H��N'V��D���W��Vy����	�����iz�M�v!h�P
`\k��]�.`S\p`����
�B�*��������u>���!n?@�l��e/+�iW�V2F����X�Sh[�<��wt�9P���m7���b�e�`��J��������c�b���B�*������ g;������8u�0�������������#���V��G��,������=�S���(���X����eJ�!��^�3h��L������9f��{�m|c���%��R�&�.8b����;��Us��X�,��m���f	��,���L��KL`�0#f+�R�� ��\�Sh[�17��%d)��c�>}l��A��Dt`XX�������.Ah���B�����rd[&��V��I�B`��Tv���3�-��X+CK�L�'_�SD�����K���l{��b4�2?,��g������N�+T�����v�R���)�sl=8�y-4R���^�kA1�����7���X�p�;e5T���4�����R�Z9��H�>��]��hB��!����!��2;��)K�J�nd.a����L��������f��
IxFS�Z��d��=��u�S([�G7���q�[6�J��9m[zD�^4������V�����@Z�S([EG7R�g��J���2��=��}����9����XC(Y�V���bgP�W����9�,��w;��e�C�<�c��#K�;B���9�	�P�B����c�"Z�2��v#�����X	�#k	L+���k�Uz+�h���P�S�BG�P�R9`�{W��[��nI���X�T_�fW�JtEg������B�*��I�J�Cd.��p�x�)(����K9���B�"
2�q��Jj2r
Y�\���J0H�(�����|)��?4+]���yO�����1�X����V=����b����X����Tr�}�1��]Q7
�4���,�gM�6��X%C|2 ����B�*+�nY1o�g
�k�n>��,���0E�B�b� na�`����[�Ccm�82W�E�G|hm���V����a���U2��,�Cc��V��w���m��bh���]���u0!j�����:B�V��b�N!l�����G��a`-�t&*5�'�f�8N�['���m�:���
)�`gP6T�/tK�OR ��s$[���
m;�%����u��.
�kAN�j����6���|`����[.�?�����vh@�/��9S�d
z4,,8X�SVDU��H�bk�M��\GB'�h����_x
Qy
5~�����{}q�3�w���<��I:�,�����~9��{�����_}��2���#�{������	��Ca��-\��<���I�'��U�K��������N~�����j�Zvb��&���������K������cT� �>�S���������gaCr�����S����/V������'g�.v������W�_��i��?e^��O�X�����W7�|
��O�(�]������������[2�Z��w3W�����ei�8��d}���'}6��u����+�����x^�c��=��<�ZKH�zb�3��������O>�?�����^��w���o_��8k�7���G'n�����<����UX���F�GkIB��5��5��5�/^�0L~=��#�|��G�NqYoi��������7�!��v��}���������������w���oi�������8\]���>�g_���r�>�#2�sb��$�,��d��FZPr��8�Gc�'cP��R`�Cl���.�C�c!��"kh������������dH���\/�|���#��b��{�����c�|U��-��*�Ub+�����3����u�y�k�G.�LqL�/p�����A�_�X%C���[�3��4�i�LEs)��y!��2m{�f=>�5>��+VKd&c��;%������Pw�q�k��D��u���U�b>����+T���N���#tJB=��j���jmOr5�J~W5�~{2 n=v�*��Xa������S>2���h���]lh5�i����iw�-����V+�{49b���w'T�������	6D�m���]*~"C���N�c�7��o��U2�IB�|���+��r�%��-�MX����Tp�n{��#*�4����!���jip�������W�y���v�Rt����:{-��7�X��/���T!O
T�6���)l���([���Q��
�[�Yn�sD�fMd(	�����u�
���Yv
c��C����7]Zsv�������E��r�#T�N���#t
[����l��y��z���8@w���1Z[��w��X�WbB�)G��V�F$.%���B�j9����^q�8q��.�I���!��4�VyZ'YK��<���������^q�89�J����LA2(NV��V����#kS���:*}��xF�fW��^����JW�j�w#Nj�;��U����Im�	0����U1��?�Um��A�����!��A��S�S[�.Q���K'y@}O�V�|�:���)��v$sn+ZK�����%�
��[��vk^��Rf��`��

��m�Q3��YoB�_����G��)��1��B�*{�@;%��:�����������5��L�X%C��q�Na�R�5���s��8��g�C~_����90�
8����uL)���V���;x
k�����W	�#Fh%h�����u�BQ���"CQ�.`�hL����)�*l8P?VDE��z���}�3Q4�a������q��
J��U��!�D����)��*w8��af����g�q9��.��lq����+V��`r�$d�;��U���09*�s�=��XN��}�f�s���+
����Lq�xaS�<��U��!��Q�1��b�a��z�dR����7���;LY�J�`b����kI�N�l��p�V��)$��T�RI�<���+^3�=B��cU��)�^�Js0��Q�A��b�Y�:��/X&��=�b�L�d����
�J��r\��T�3hKU�y0�������ql
E��vz&X�Y�.:���
V2����VB�]S��V����b@�-�n����y
����u�&	��V����!����)����4Vwg���\��.R[�=@��d���*T�����)t]�I����1�J��h�
�i�#�4��BJ���5�����MK^�Sh[�A�-���2�dm��m��3��X�g���d��+X�3�d^|�NYU���)F6!�T��t�No<�4V1��BYn�jb����B����4��2Y���dk]O����n;�tV�:���*P��96$�+t
[�*H� ;Do3c9`��;�[��f�l1������j+Q���ZE��)��� u+�x�X�cdl�P;E[,�2��W�b�%cC��w�*>�*��qT���PUA�V�zt!1�o���\
���[��nI�Y��y%��U��	���(a�`��X\�F���GWB����>�f�����+���E+�T�0!{EO�n�y�
gL�B�����Q�&m;ET����L$�
*X���h��&qO�m�y�(�Q��Z���{�3��~����iv���Y���L)��V���������<���9����;�3m�.gk�3J��_x����L�d��
��
�B�e��HwN�)8J)������q�.>��F��
��H���
��l�!��vO�m��x@��y3��}WB�����,n��*��|0���U"��eI�8[�S([�<�"��1&���������l���Op�����DZ�4'b�e��B�����@�@^@��s�#�����{Z�^�����Z�V��1������)�������:@h
xhe�������`������ZiS�S�B�I�+x
q��Cm;)&(��>����_v��A��
D�[
'����V�QoX�3(��N�FJ����B��y�u����:e5����7^�Z�n9*
;��Us#�:�!Gp1��ZMO�������>���L	�g�#t�x
m�>�F��\$d�b����i���	�9�+BSz��P������K�%a����SVE������c�+�&��9Jq���[���1��P�&wk��9L�(s^�SVDU����=Z����>e��+�����-��A����uL�)��a.�)��J�nD2L��&@�����������v=h6R�l�p����2a7��������
)��.��'�#���#���p{h���q����-��JY+�S�B�8�����G_�gF�VS����o��jv�A,}5����d���3_��Vi���2��U�Sd�i%�����x��F�"'��_�J���~+k�`�P���n��(�R��(�5P��s����d'��=YX�`���8�X�3(������R�,!���O��^/�$����k�Z�����U�J	&�fW�+(�)+�j�~@+���|�(��i�Oh����a%���f���%8b����FxvR�S�B`}�{l$H!o��[��=�
��ka�{���B�7�����u����\�SVD_}�������:������z��n0#uU�Z��rRU���B����m��'�.v�~1?�����Y�%�C�2�`���0U`�NY
U�##0d��;Of���m��Y|,���.X���X�r�`�P���~`f"�����d�2�:m<��-��r(&<K<b���������y�N�k������y���y��r����a5+z�q�)AV������/�)�����$�, 8���`������vgT���l@���*����C�;��Ub�#[6�0C�g�R^�)��Y���	�������v:]�3H��DF�H����o<yzl;��P340>|R�� ��;"�P���a�b��$I�)�����r��"�6(����MF*X+T��4�t���+x���R`�U���6���V7��	"��yE��[��[��Co^���������A���}��N��g�?���?����������������������&X�D��p0��p�O�$�'��$�Y0^�������N~������������X�;�Q0q�����<��W{:F� sfah��S���������ga\r	�������~����w\_<9{u��<�]]^�������L�_�)��W�����������S�~2D9g�F�l�pw�����r��qL��u������F�:��-�D�@����V���������gw��u���o(]G8�p,"%��;@����v��/���t�����������������pCg�W�^��w��3��uv���>:a�����G{N���v1=����iIn��\S��I����Yh������k�������sB��4����f}����F�;��{d��������w�������C���=�z��������`����p���.=)w��=aT=e/Ibd�+XKXu�4g~�SV+x�c=_9^�-��3�N[N����e�;�N����
|���V3%h%�7����Z�]P�N���D@Z����Sn��fa:����z���1�c~��v�v_�	H> x��lL�5%�.��f�K��O�;V���H��,��]:���-N1/�@.q�O&m���}����12�c���Z��m�p�N���Iy����,�h�g�x"���4��������/X%C�������Q���9��������I��������b��e�V�F�L�kH���k����!G�{���P5R�����S�"����%�,X-C�>$	�V���$���F	-�E�`���\W�m����@� ����_}~�
��|���#���:�}�
�7T�<��m3V3"(e"$;HZ�Z�I~�����������y����)$B��'����f�����k�+F�fX/X�c��'a�������#��1G�����N`���
����������`ES�J�x
k��C*Z�{R�C�=������co]�[�V���u�Y([=v
c��C:��Q
���q���m�d��:Z��� 	����@���������*��H�x�������,6�eG�{��S��V4��r6��)��b��i�]��S��v��Q%^q�}�L���`%��u����+x
k�����1c���h= �,of,�����X��y�BG���L)���9�^W�
���*����V�l��2m���i�y6�]1�{glk�{�*X���D'������k0����W��{X��o��K�3&��4��=)W��1h�����g0���#�vb�H�	���`�&l{75g{��$sr+X�������
������HO,� /gt�6g���m]+������(+TeW��)H��J�@^�S8����hb�
���Z������5�dR��������V2�L�0�q|+x
o�*����G��*���x|��0������
V2�����Y�
�������.���d�#���)n���j�����.X-C���~vO�l��pD��i�q�g>R���stU;�J�I�h/`%S�-�GfI�N�lU�pD+�o����+��^g�����c��)3+X�����C'�����*c8��9��99��7z���j��Y}n�|��3��1�i6��7���0��c8��� �|Bn6�n���m{�U�k)�`�e���m+�[�lP8�yOamU�p`B�
1��Z�j��m)�{(�]�Q3���I,��W��)�1����,�+���F#�mr�]d��aO�W����`����eP[a���eL����=��Uz��
��l�B����������<[��r8,������LB!/=2��)���

�����/��j(���`7:����-�p������Rj+n����v���-���s��4sB��g� oW7N[�6J����rk���4���R�Sh[�7���.���X��6|�BS�F�#�����*��1%���(��[�SD�hD�s���T�����g��"FW�f3*&�[a2�
VzE���pE��)+�J�4$�md��lt]+��v�h���2���x�+X�L%�a���B���H�J6Q��]����d��H��T����+XK
��{�,�iO�mi��������Neu������P����d@�y���La�����W��QEI)����u>8�j�����#������a��{�`%S�X�*��W��rUy�^b`��_��z�D�_K��h�Y M!ZQX!����I1���2�-z
u�r�#�a���|c�s�j�Vw����|�l��7*(.�_�J�X$#=1��)���!�����)$��T�v�����~���F��g�+X+�)���.�1pO�m�
yD6t1���d!��S
������f��]���Y�Z���&J�MW��.�����%$�8p�o��	��[Z�f�h�5,��_�J_"'>�X�Sh[>���"w��)"w�t����~�l�s3���
VZAL>o�d��+x
m�
��*\I���b�������������x��
V2�$h��,aO�mU�x��2k�u���%!����b�r�K�0yEkJ��p��-z
u���J\���#��Z�	Z��?�
�����,
�Q:�xA+��gj5����jAOanU�x��/r�-c^L=��!��3v4��`���e��
��l��%�����uU/s#z:�� N��dz��J�mU�r�J��Y+X�%�V(�-�)��Z�&��Cv���VG�����W��J�U���X�����Vu�]S��V����e�ur���.����:m����F��]�Z�����:��g���B�*����o��d�y��w�@�}���U�w&
c��dJf���aZ+x
m�\�F*�J���mi�G7������������j�*�!'L�X�S�C��������zt84�0�yx=(������6X+��$�������*���B;�C�1rB��_=�"lL��"n�4c&���&H�`%Sl&y��Y�SVDU�H
E
��D�=������5[��
VQ\�Z!V�&J�P��V�����A~���a�_g�����Q�����X��0�w�=�_�Sh[UC7Rg���}@,] {�q���?0��fw
��$l&���L�h�����`gp�W�����.�|����>Z���V��4#*�b��h+X+�
�@���^\[�Sh[EC?$b��p>"6+��'� i���Yc��'���/��K8�aO�m
�P�]�����)zB��u�Y1Dp����6A+X����Jv*�����
�~���t��)���l+���n���f�/~s+��8f_.4eO�m��`�6X�1�w��%���ptEhV��D���+X�f6AXr�����*��B(��J�e��"��m���f���{lq��
V2�'�$Y�Sh[u=?6{/�
�eO�w|�J��F[����%CX8�tk���,	dGk+x
m���G�7G\���4������V��$x@����`%S����0l^�Sh[�7?��!Z�a�t^����fI��� TjW�mm2A81rO�mU�������FH9��9�@?[��#�������� �m��3�����m�\��Z	p��V#E����P1H���
d�1V�l��;��U|C�p�J�%�R�Z���}�@�9`,:g|+���	��2%Y�B�xO�m�B��V�Q����;;������������y����w�/n~��?����=�|�=��������_�<�{�����_}��28��#A{����S�H�;FZ���u?���I�t~�SH&
�W��OB'?I�UB
#RB��<E��c��3��y�����t�j���c� j;����|6�,�
e
9��B�,����y������\_<9{u��<�]]^���������_�)�W�����������S�~2��(�M����-�?�)G\�Lh����Y^����*�aD)����s�7k�#Cu�����Gw��u<.��<�1&w�O�R�"3��U�t���/������_���]����|�?��������bq�����g`���p��9d������N����=z�,O���[N<�a���G���g�Ie���_�f�Z���?bd�������)��-
�|w��_��&9D������U�;���<���?��lA���'W/��yq��|�y}���8��'�.}��\� oa���l+�b}8�y+x�iV=�I$/I�	liT�B_j���j�zkS0$����L)����
����F>4��<C���u=�����z�Y�eJ�P^�XC8B��2C��.�Y�J���:����n
n<�P���
&�r�����`���	���+xg��P#.����@��vMzu���q�Y��9f�)K%8B�R��p>�����US�F2�|�	]��l�������������qB�`�2_�ET���z(>�W��K���E�`�l���s�|��"L��X%CY���������=������1E������7�W�
D4���]�Z�J�`��xY"�������=���M@%�}��~����@ qK��dHi�(�k���+��mI������3g{&�'��o���A�jG+�Z�J�=����*v
ck�6��n#9H!�6����b�m�X� /�����:N����S[�.�� �C�w%y�J!��9�k����������9l���G�s�<��U��!���%N1��[����X�������a��
�:���
����V���.�#%��c���./������������a+;�Z�JAub�/La�����*w���|fkv��g�{
�o{��Gc��"��z*V+�
��	v
a��#��� �6F`��_�����W�K�f��nvky��9�0FY�SX[u/���"$X���}�z��f�C���+V���(�����P�J_0��(�SS��/� l��@��!��������V2��5L���G��V�0�e%o�����mw���sH.�������h%c�+��[��g0��CsD�lJ\R
lW�����M37���J��0H�`�(�{���<��U
�5����[.�kl����0j��[c���j+������V��.�_#�Xv�>`���m��`�������I�|��f�5�����rOamU�pDKEcpd��bm���6k&Z����]-`%S��Aa`������b8���2���I"�kCsP�i�jv��2O�[�����V�yo-<�X�SH[e1���}�hm\_������l�
l�� �%�k����p-�)������9�D��c��)^D��}d�����h��0�{�'+X���Aa��
�������6S�9���]��sH�m�*��!�3�����Z�J��w(���<��U�y,a���
D����k�m��Ul)�`����U��Yf���K+x
i�:���X�R�j��e|]G�X����#-Y��������2G�&F����U2$sDXc\�3�U�����X1��	�oH��`���|�MN�^�J�xD���
�B�*������T1��L=q��+���4��
9��,`%SBYI��iO�mU�hHqCB��KL���
n;�@�D-�
�9+X�oK�,�%;��K���
|r)��zh�-i
{�vr���C�!���)�����gO�mU�hDqs>�R6IH�gNS����D{�����*7H�K�/��a]r��t6���"��G#r^�=p����*���m�#X���=���*u���
�z]!�`E8�fOYU*�!�0pD{F������;@�Ua ��b+X�e|��(}O�m�
iD+���29����n��4�E9��a6�
V:�+��pZ�
�B�*��X��`�a������>��+B1"w����
U2�i�NYU���
=���Hl{v���l�j�@SH��d
��ckv��<��\�BjT���qW�?0q���]���d���n�Z�@p�A+x
q�`�#�a���������
�8�����4�q�5�Q����vz	��V������B�*��`�.f�r�=r!�uo��R`o��e�
V2�a���vO!myD0�my�L������v@�l�S�yC������ep0�5�=�����!���K���tQ��'hh���P
F[C��){������wO���)�����t	=3��8!l>N���b�F���`�)��������������#`�\,����[k���L��D���
�Z?�v��%����)�����D[�3!�.#�v��pA�Y0p~�����W����A
b��=��U��-$K1�B��E��w����5���&��*��eLiKl[g���Y�S�[�2��";g��������e�����@�x����%�!�����uU.s#r����4Xk&k4�������f$G�Ii���LNbw������enD,������?�Ry7��
4[�p���!�	=b���c��gO!m����T����X&b��)��q�*�'���K8F��dF*��BcV��V���f�B���F@}�	v��`��E�m��+MNX�J:������hO�m����7L!K������hxEh���q�"��d�#2���x���:���b���7}�%G�Vx�VDL�Yf+�vE(���c0Qx����c��.W��QU>7���|O"��uu5�{@��1�f�i^`���
VZ��l
�c�<eAT������U���B�����moSQ����X`/`-�Z4����h[�Sh[�C74/����hcJ������m'��n������V���g0AZ��������nD7$K!1�O�/3�����%� !�vD��X�d��07n�������
��BL1����V������vEh�Tj���|+�b�E����<eETI�I��q��P�9�;kt���CR��K!?~a����N���.}���VI��H�yC��[f�����>�Y1��6�(c���1���
��V��V1�U��L����k��[o.�� �&Z�[y+��B�s"c]�Sh[�H?"F"1%���H]9Ke���i�Yt��t�y�`%S8������<��U1�#�{�R�������.g��U���^��V&�]�,`-��2RAJ�
�B�*�����D)G1-��q�����v����(l����b[[*��+hO�m����f8��D!��g$kx����YF���xY���U2�r4A�������Jo~Hz��*�b���`�v�j��76 �N^�j��o}���<��Uz�#�oC��0��s0a���fI���pk|��+��M&5���
UzC����J���
��CB�3[4N8�q�*�����E�G��Ve,�)c1%F�)�	}~v��5g�E���BXi���L	�e�#�F�����jcaD���G��u�����{N�_�uT~]�����g����7?�~���M��N>������O���/G�=KN����>�M������=im#���s^C���$O�'y��I�2�2-U�v�������OR|F�x�PwF�)o��B_7gxLN�<�'��oO��&���;����x�vx��������r����������B>�{���xr��bwy8����Aq�����V��S����������}}u���P��d���2�\:�qE�'?���C.qB�?��d���t{U##���)�?�����PI����<����y	O+t�����\�W�xg}k�i7�����|�+?��k8x��O��wvy����zwq��8k�7�������W~~u������{t���I�����yu��g5V=��H����
�$�k�/^��M~=�`���{������z���-
�|w��_��&9,������GV���z��|w����;���;��}�������^�.�k^���/���I�o]6�WG\�G�.�����G���jM���^�A��8��G['[1x��Y�.t����c�������y�h���l�*- �1uw�T�������o06Xb��S������h�r�5��������
Ll%C���<bg�u�2���%_z����.�^�����o0
�Z�J�0l
;�-�|]�B
��2�-Dm��i�����Z�����X%C���>*vJ�V����+@�7t�cp�����
5����Z����U2�����u�N�%���#[��T�����	[q�E�����_�N�^�:�0�-�0ql���k���N��.��%��6��N�W�z�j�����(��^�Z|�F8]�Bg���qwgq��}�#a&lG�:�h�{����lU��N��P%3�<n������V����������k���0k������
77Z�Z�$#lZ�S�Z���;{;�03����#}�����)�l����l}wl�/��U�J�)��}���)��f0�nD�����|���������`k�����d�'���y��)���hg���s*����r���
`�Vz@U�Jf�i9��V��V���������m���,R����+�f��� t�+X��"A�VB��T���������t����������m�&x�Y�y'/n����%�K�#�+t
[�n#M�|tD�	;�f��F7�V���!d�	$�.���<�+}���nV��V����"�RH��a�'tK�������}��L&Y�`�Ac�nv�Nal�`@A��-����u)h������yT��p+�����`�)��R�����*�������,�8N��`h�{/�n�\���R�V��)6D���9�������H3��l�n�������m�oEM	�9H���X+�K�a�������jj8����.�+au$������}X�����A�$T����5���{'�G��.c��W����W��{y��,���#�T�G�TY�Z�'�`�T�Z�S�[0�c���4V�9L�(v�&���|�_Xo�j�x��kb+�)�����<eAT�
d���s��cwv���Z���zb�oh���+����f��!��6+x�������|t�L������������ I���!X���&���dO�lU�pD����DZ����l����x��S~����
V2%�7�����'��)��:�x6���06�V&v+�=l[�K��r�P��H�j*XK}&0(��_�SX[�<�� �%���k�>l����I�����u��t���2�8�#�U�<eET�G��$K����.�#�H�n�<������A����dJD6A��{�`-U��F��r$�h!E��?�1��<�FNlH�������1
���)��r!�P)�G,�.G"}i��W.|An�#��a���u�Q����*v�j�R$������	��U����������'����� �;p�Na���r�w%z$N-4��5�S�n?4�j�e�	��`-SR��*�9��V���E�c3�d�w)�J���!���P��1�f��{�y+���!m@\�S�C�ih�I��sL�<����Fmjw���zXM�����	Q�xk��C&�yOYU���ZB�,���@z�n���� �B��ZX�e�:��U|��!��eR=qW�n��O��V��/,X�J�pO_X�S([�G�VK�x�6qd�,m��<������c4��cv�&.`%S|
�u�|�V������H#�>S���x��j�w������z\�����
w�V��`����9bg��r&�>��
\�),�z�7T��}:�[SV3k���,}W��1>�q�f���)��r&u
9�I��m�o����u5��c�\ �����La�{S����B��;�������J���o[����z�f:��q�9��e�'���<��U���z b@���9�m����������R`oPX*���L������.��&�E�Qt��3lM��v�hvP
H��(V��)��e�
��)��*!�����L*%�	�T�dx��V1�
9B$���
V�������+x
m���#R����e��'���= hv^���F�+X��2J�&�x
m���#c�By�@��%�����KzA��r������+Z���&��n�S�[�=)*D�b�X��Z}��Y��n�A�3X��4�p�e���2�1��.�)��"���a�
��w���I�m�*�x��w+X�g�I��<����en��/oUc�q)�P�k�n�!3h�x�C-q���%8x��+v
e�J�T2�r��d]��G%��[���L��K��G��]�J��������)x
m�J�FT2�2���/Q~�i��]^ �����<\���L�j�����dn�P/�����,}!B�v����XS0����X���s�b�
�B�����,����2�����CV���P�cb#,q�P%�8��!���;e-T�����s8�#!�0?��
��X<:+N�\�J���Z��� ������n=�����C��J�r���q|X�:P�������c�Dv�W�SVB��H����#���y�������w���,K�����b���a����B�*
�!i��bL�En%b5������f��llkX��,X�"[���m:*v
e�(�F�:�����-3����,)��|y�=�*V��c���`gP�WA�	�.���V#���u�N0FR�5^����1#���!%:��U�b [��d�J�rl��_?��nW���W�:���2�P=3����������G��F.��z6u)�����f��&������T�96����2��B��,�e����!�Va@k�l�c-�b���q��?�kx�H��+t
]���Zb�8Rd��������W��QB_�e���tf�wh�����B�*����;�0Z�|�8��z���q�j�w:k���D�*��-gu� ���B��������W��B���v���v����Y*�A���B�b�cZ�08��)��R����=g�tZC�NG����G���`����y�j��h�t}�N�l����������`C�������f	���00_I�r�`�V��V!�i������a
�>�5k>�S��J�:�J[�|�(��\�3(�����\=!����:�X��X�%��
�z���kAj!LS(�)<�
Z)����E����
�������v�4O�[C���eJ�AX_��������n��aM\ ��{���6������/!*/���y����w�/n~���s���{8I������O���/G8~�������S�W|��oOZ��^�%SR�z��$O�'y��I��5��g��'Y�o�I��')���Na��-��>e��������K������cT�����y�v<i;<���l�YXK��b���r]�T�%��������������������e��?g^��O�X��Sf�'������'�B���!��1���,�E�'?����N9q�[�����ei�����
�����SJ��;>���N<���%
�'���<����.���O����,�z8��_�l������q��z�h��W��������6`|����gw�_���|��U>���������/�^���7���??��������]���?�h��Y���~������'O��:��z�jw����_���������P�������F{�s<^�!���C1��/����?����������(��&����PK��y�fQXEPK�5CR�l9�..mimetypePK�5CRTConfigurations2/popupmenu/PK�5CR�Configurations2/progressbar/PK�5CR�Configurations2/menubar/PK�5CR�Configurations2/statusbar/PK�5CR4Configurations2/toolbar/PK�5CRjConfigurations2/accelerator/PK�5CR�Configurations2/floater/PK�5CR�Configurations2/images/Bitmaps/PK�5CRConfigurations2/toolpanel/PK�5CR*Co��'
Ostyles.xmlPK�5CR��h��>	manifest.rdfPK�5CR�4���o}
meta.xmlPK�5CR���J11cThumbnails/thumbnail.pngPK�5CR�L���6�=settings.xmlPK�5CRZC)2,rBMETA-INF/manifest.xmlPK�5CR��y�fQXE�Ccontent.xmlPKe��

v13_costing_hacks.patch.txttext/plain; charset=US-ASCII; name=v13_costing_hacks.patch.txtDownload

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5d23a3f7d4..c242c706f4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -24,6 +24,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
@@ -1675,14 +1676,8 @@ match_unsorted_outer(PlannerInfo *root,
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
 				Path	   *rcpath;
-
-				try_nestloop_path(root,
-								  joinrel,
-								  outerpath,
-								  innerpath,
-								  merge_pathkeys,
-								  jointype,
-								  extra);
+				EstimationInfo estinfo;
+				double		estgroups;
 
 				/*
 				 * Try generating a result cache path and see if that makes the
@@ -1691,14 +1686,41 @@ match_unsorted_outer(PlannerInfo *root,
 				rcpath = get_resultcache_path(root, innerrel, outerrel,
 											  innerpath, outerpath, jointype,
 											  extra);
-				if (rcpath != NULL)
+
+				if (rcpath == NULL)
 					try_nestloop_path(root,
 									  joinrel,
 									  outerpath,
-									  rcpath,
+									  innerpath,
 									  merge_pathkeys,
 									  jointype,
 									  extra);
+				else
+				{
+					estgroups = estimate_num_groups(root,
+													((ResultCachePath *) rcpath)->param_exprs,
+													outerpath->rows,
+													NULL,
+													&estinfo);
+
+					if (rcpath != NULL && estgroups < outerpath->rows / 2.0 &&
+						(estinfo.flags & SELFLAG_USED_DEFAULT) == 0)
+						try_nestloop_path(root,
+										  joinrel,
+										  outerpath,
+										  rcpath,
+										  merge_pathkeys,
+										  jointype,
+										  extra);
+					else
+						try_nestloop_path(root,
+										  joinrel,
+										  outerpath,
+										  innerpath,
+										  merge_pathkeys,
+										  jointype,
+										  extra);
+				}
 			}
 
 			/* Also consider materialized form of the cheapest inner path */

#81

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#80)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 3 Feb 2021 at 19:51, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached a spreadsheet with the results of each of the tests.

The attached file v13_costing_hacks.patch.txt is the quick and dirty
patch I put together to run test 5.

I've attached an updated set of patches. I'd forgotten to run make
check-world with the 0005 patch and that was making the CF bot
complain. I'm not intending 0005 for commit in the state that it's
in, so I've just dropped it.

I've also done some further performance testing with the attached set
of patched, this time I focused solely on planner performance using
the Join Order Benchmark. Some of the queries in this benchmark do
give the planner quite a bit of exercise. Queries such as 29b take my
1-year old, fairly powerful AMD hardware about 78 ms to make a plan
for.

The attached spreadsheet shows the details of the results of these
tests. Skip to the "Test6 no parallel 100 stats EXPLAIN only" sheet.

To get these results I just ran pgbench for 10 seconds on each query
prefixed with "EXPLAIN ".

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

On looking into why the performance gets worse, there's a few factors.
One factor is that I'm adding a new path to consider and if that path
sticks around then subsequent joins may consider that path. Changing
things around so I only ever add the best path, the time went down to
1067.4 ms. add_path() does tend to ditch inferior paths anyway, so
this may not really be a good thing to do. Another thing that I picked
up on was the code that checks if a Result Cache Path is legal to use,
it must check if the inner side of the join has any volatile
functions. If I just comment out those checks, then the total planning
time goes down to 985.6 ms. The estimate_num_groups() call that the
costing for the ResultCache path must do to estimate the cache hit
ratio is another factor. When replacing that call with a constant
value the total planning time goes down to 905.7 ms.

I can see perhaps ways that the volatile function checks could be
optimised a bit further, but the other stuff really is needed, so it
appears if we want this, then it seems like the planner is going to
become slightly slower. That does not exactly fill me with joy. We
currently have enable_partitionwise_aggregate and
enable_partitionwise_join which are both disabled by default because
of the possibility of slowing down the planner. One option could be
to make enable_resultcache off by default too. I'm not really liking
the idea of that much though since anyone who leaves the setting that
way won't ever get any gains from caching the inner side of
parameterised nested loop results.

The idea I had to speed up the volatile function call checks was along
similar lines to what parallel query does when it looks for parallel
unsafe functions in the parse. Right now those checks are only done
under a few conditions where we think that parallel query might
actually be used. (See standard_planner()). However, with Result
Cache, those could be used in many other cases too, so we don't really
have any means to short circuit those checks. There might be gains to
be had by checking the parse once rather than having to call
contains_volatile_functions in the various places we do call it. I
think both the parallel safety and volatile checks could then be done
in the same tree traverse. Anyway. I've not done any hacking on this.
It's just an idea so far.

Does anyone have any particular thoughts on the planner slowdown?

David

Attachments:

v14-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v14-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 61a88b6beaa59b4421c0d6424db47b1c57bd7593 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v14 1/4] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d9d1..a116f637f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3077,7 +3077,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..aaff28ac52 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index adf68d8790..81fb87500b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4792,7 +4794,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..86e26dad54 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1684,6 +1684,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 47ca4ddbb5..d37faee446 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v14-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v14-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From e34f3827b7bee35df7c8235f9e384f5045a2fc09 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v14 2/4] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v14-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v14-0003-Add-Result-Cache-executor-node.patchDownload

From 4bf55bbe815fb411bd706d384eb4517b301090d2 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v14 3/4] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  232 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2758 insertions(+), 176 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e115d6..8b990f7162 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1834..d4cd137dd6 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4df1405d2e..dee2cc4baa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1736,8 +1736,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4857,6 +4858,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f80e379973..99c1160493 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1280,6 +1282,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1971,6 +1976,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3044,6 +3053,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..41506c4e13 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..3e0508a1f4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4ff8000003
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..15a6a4e19e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4980,6 +5007,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..2ce54a526a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1907,6 +1922,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3861,6 +3891,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4095,6 +4128,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4388aae71d..c58325e1fd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2191,6 +2191,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2877,6 +2897,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..41725baabc 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4027,6 +4027,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aaff28ac52..38d6ee11f5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..5d23a3f7d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,198 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1674,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1683,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1853,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1878,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..a564c0e9d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1510,6 +1523,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6344,6 +6407,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6930,6 +7015,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6975,6 +7061,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..9584cdb653 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..92ad54e41e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 86e26dad54..3229f85978 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1547,6 +1547,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3847,6 +3897,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4065,6 +4126,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..07e5698a82 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index db6db376eb..08c9871ccb 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 943931f65d..e31ea90bf7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1981,6 +1982,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..4ef182e3ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..31931dfd8a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..5f0c408007 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1dd12d484e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..e9b4571426 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -78,6 +78,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 12bb67e491..715551d157 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -114,7 +114,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 59b416fd80..d343fd907e 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -199,6 +199,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v14-0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v14-0004-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From 6bee9c944230ab414c9f07871ffdf9ee6ee84ad6 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v14 4/4] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 4ff8000003..4d6cd9ecfe 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

Result_cache_v14_vs_master.odsapplication/vnd.oasis.opendocument.spreadsheet; name=Result_cache_v14_vs_master.odsDownload

#82

[1]: https://github.com/gregrahn/join-order-benchmark

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: David Rowley (#81)

1 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Feb 16, 2021 at 6:16 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 3 Feb 2021 at 19:51, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached a spreadsheet with the results of each of the tests.

The attached file v13_costing_hacks.patch.txt is the quick and dirty
patch I put together to run test 5.

I've attached an updated set of patches. I'd forgotten to run make
check-world with the 0005 patch and that was making the CF bot
complain. I'm not intending 0005 for commit in the state that it's
in, so I've just dropped it.

I've also done some further performance testing with the attached set
of patched, this time I focused solely on planner performance using
the Join Order Benchmark. Some of the queries in this benchmark do
give the planner quite a bit of exercise. Queries such as 29b take my
1-year old, fairly powerful AMD hardware about 78 ms to make a plan
for.

The attached spreadsheet shows the details of the results of these
tests. Skip to the "Test6 no parallel 100 stats EXPLAIN only" sheet.

To get these results I just ran pgbench for 10 seconds on each query
prefixed with "EXPLAIN ".

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

On looking into why the performance gets worse, there's a few factors.
One factor is that I'm adding a new path to consider and if that path
sticks around then subsequent joins may consider that path. Changing
things around so I only ever add the best path, the time went down to
1067.4 ms. add_path() does tend to ditch inferior paths anyway, so
this may not really be a good thing to do. Another thing that I picked
up on was the code that checks if a Result Cache Path is legal to use,
it must check if the inner side of the join has any volatile
functions. If I just comment out those checks, then the total planning
time goes down to 985.6 ms. The estimate_num_groups() call that the
costing for the ResultCache path must do to estimate the cache hit
ratio is another factor. When replacing that call with a constant
value the total planning time goes down to 905.7 ms.

I can see perhaps ways that the volatile function checks could be
optimised a bit further, but the other stuff really is needed, so it
appears if we want this, then it seems like the planner is going to
become slightly slower. That does not exactly fill me with joy. We
currently have enable_partitionwise_aggregate and
enable_partitionwise_join which are both disabled by default because
of the possibility of slowing down the planner. One option could be
to make enable_resultcache off by default too. I'm not really liking
the idea of that much though since anyone who leaves the setting that
way won't ever get any gains from caching the inner side of
parameterised nested loop results.

The idea I had to speed up the volatile function call checks was along
similar lines to what parallel query does when it looks for parallel
unsafe functions in the parse. Right now those checks are only done
under a few conditions where we think that parallel query might
actually be used. (See standard_planner()). However, with Result
Cache, those could be used in many other cases too, so we don't really
have any means to short circuit those checks. There might be gains to
be had by checking the parse once rather than having to call
contains_volatile_functions in the various places we do call it. I
think both the parallel safety and volatile checks could then be done
in the same tree traverse. Anyway. I've not done any hacking on this.
It's just an idea so far.

Does anyone have any particular thoughts on the planner slowdown?

I used the same JOB test case and testing with 19c.sql, I can get a similar
result with you (There are huge differences between master and v14). I
think the reason is we are trying the result cache path on a very hot line (
nest loop inner path), so the cost will be huge. I see
get_resultcache_path
has some fastpath to not create_resultcache_path, but the limitation looks
too broad. The below is a small adding on it, the planing time can be
reduced from 79ms to 52ms for 19c.sql in my hardware.

+       /*
+        * If the inner path is cheap enough, no bother to try the result
+        * cache path. 20 is just an arbitrary value. This may reduce some
+        * planning time.
+        */
+       if (inner_path->total_cost < 20)
+               return NULL;

I used imdbpy2sql.py to parse the imdb database files and load the
data into PostgreSQL. This seemed to work okay apart from the
movie_info_idx table appeared to be missing. Many of the 113 join
order benchmark queries need this table.

I followed the steps in [1]https://github.com/gregrahn/join-order-benchmark and changed something with the attached patch.
At last I got 2367725 rows. But probably you are running into a different
problem since no change is for movie_info_idx table.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

0001-fix.patchapplication/octet-stream; name=0001-fix.patchDownload

From 555c6eb548e4c9ebe0fcc3fc6bfa114a318c5d6b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 21 Feb 2021 18:07:48 +0800
Subject: [PATCH] fix

---
 .../imdb/parser/sql/dbschema.py                      | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/alberanid-imdbpy-1f32da30dcae/imdb/parser/sql/dbschema.py b/alberanid-imdbpy-1f32da30dcae/imdb/parser/sql/dbschema.py
index 2f359fb..db209de 100644
--- a/alberanid-imdbpy-1f32da30dcae/imdb/parser/sql/dbschema.py
+++ b/alberanid-imdbpy-1f32da30dcae/imdb/parser/sql/dbschema.py
@@ -186,7 +186,7 @@ DB_SCHEMA = [
         # the alternateID attribute here will be ignored by SQLAlchemy.
         DBCol('id', INTCOL, notNone=True, alternateID=True),
         DBCol('name', UNICODECOL, notNone=True, index='idx_name', indexLen=6),
-        DBCol('imdbIndex', UNICODECOL, length=12, default=None),
+        DBCol('imdbIndex', UNICODECOL, default=None),
         DBCol('imdbID', INTCOL, default=None, index='idx_imdb_id'),
         DBCol('gender', STRINGCOL, length=1, default=None),
         DBCol('namePcodeCf', STRINGCOL, length=5, default=None,
@@ -204,7 +204,7 @@ DB_SCHEMA = [
         # from namePcodeNf.
         DBCol('id', INTCOL, notNone=True, alternateID=True),
         DBCol('name', UNICODECOL, notNone=True, index='idx_name', indexLen=6),
-        DBCol('imdbIndex', UNICODECOL, length=12, default=None),
+        DBCol('imdbIndex', UNICODECOL, default=None),
         DBCol('imdbID', INTCOL, default=None),
         DBCol('namePcodeNf', STRINGCOL, length=5, default=None,
                 index='idx_pcodenf'),
@@ -218,7 +218,7 @@ DB_SCHEMA = [
         # namePcodeSf is the soundex of the name plus the country code.
         DBCol('id', INTCOL, notNone=True, alternateID=True),
         DBCol('name', UNICODECOL, notNone=True, index='idx_name', indexLen=6),
-        DBCol('countryCode', UNICODECOL, length=255, default=None),
+        DBCol('countryCode', UNICODECOL, default=None),
         DBCol('imdbID', INTCOL, default=None),
         DBCol('namePcodeNf', STRINGCOL, length=5, default=None,
                 index='idx_pcodenf'),
@@ -237,7 +237,7 @@ DB_SCHEMA = [
         DBCol('id', INTCOL, notNone=True, alternateID=True),
         DBCol('title', UNICODECOL, notNone=True,
                 index='idx_title', indexLen=10),
-        DBCol('imdbIndex', UNICODECOL, length=12, default=None),
+        DBCol('imdbIndex', UNICODECOL, default=None),
         DBCol('kindID', INTCOL, notNone=True, foreignKey='KindType'),
         DBCol('productionYear', INTCOL, default=None),
         DBCol('imdbID', INTCOL, default=None, index="idx_imdb_id"),
@@ -264,7 +264,7 @@ DB_SCHEMA = [
         DBCol('personID', INTCOL, notNone=True, index='idx_person',
                 foreignKey='Name'),
         DBCol('name', UNICODECOL, notNone=True),
-        DBCol('imdbIndex', UNICODECOL, length=12, default=None),
+        DBCol('imdbIndex', UNICODECOL, default=None),
         DBCol('namePcodeCf',  STRINGCOL, length=5, default=None,
                 index='idx_pcodecf'),
         DBCol('namePcodeNf',  STRINGCOL, length=5, default=None,
@@ -291,7 +291,7 @@ DB_SCHEMA = [
         DBCol('movieID', INTCOL, notNone=True, index='idx_movieid',
                 foreignKey='Title'),
         DBCol('title', UNICODECOL, notNone=True),
-        DBCol('imdbIndex', UNICODECOL, length=12, default=None),
+        DBCol('imdbIndex', UNICODECOL, default=None),
         DBCol('kindID', INTCOL, notNone=True, foreignKey='KindType'),
         DBCol('productionYear', INTCOL, default=None),
         DBCol('phoneticCode',  STRINGCOL, length=5, default=None,
-- 
2.21.0

#83

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: David Rowley (#81)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

I have a couple ideas;

- default enable_resultcache=off seems okay. In plenty of cases, planning
time is unimportant. This is the "low bar" - if we can do better and enable
it by default, that's great.

- Maybe this should be integrated into nestloop rather than being a separate
plan node. That means that it could be dynamically enabled during
execution, maybe after a few loops or after checking that there's at least
some minimal number of repeated keys and cache hits. cost_nestloop would
consider whether to use a result cache or not, and explain would show the
cache stats as a part of nested loop. In this case, I propose there'd still
be a GUC to disable it.

- Maybe cost_resultcache() can be split into initial_cost and final_cost
parts, same as for nestloop ? I'm not sure how it'd work, since
initial_cost is supposed to return a lower bound, and resultcache tries to
make things cheaper. initial_cost would just add some operator/tuple costs
to make sure that resultcache of a unique scan is more expensive than
nestloop alone. estimate_num_groups is at least O(n) WRT
rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) *
cpu_operator_cost in initial_cost and then call estimate_num_groups in
final_cost. We'd be estimating the cost of estimating the cost...

- Maybe an initial implementation of this would only add a result cache if the
best plan was already going to use a nested loop, even though a cached
nested loop might be cheaper than other plans. This would avoid most
planner costs, and give improved performance at execution time, but leaves
something "on the table" for the future.

+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+			Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;

...

+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+					&estinfo);

Shouldn't this pass "tuples" and not "calls" ?

--
Justin

#84

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Justin Pryzby (#83)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

I have a couple ideas;

- default enable_resultcache=off seems okay. In plenty of cases, planning
time is unimportant. This is the "low bar" - if we can do better and
enable
it by default, that's great.

- Maybe this should be integrated into nestloop rather than being a
separate
plan node. That means that it could be dynamically enabled during
execution, maybe after a few loops or after checking that there's at
least
some minimal number of repeated keys and cache hits. cost_nestloop
would
consider whether to use a result cache or not, and explain would show
the
cache stats as a part of nested loop.

+1 for this idea now.. I am always confused why there is no such node in
Oracle
even if it is so aggressive to do performance improvement and this function
looks very promising. After realizing the costs in planner, I think
planning time
might be an answer (BTW, I am still not sure Oracle did this).

In this case, I propose there'd still

be a GUC to disable it.

- Maybe cost_resultcache() can be split into initial_cost and final_cost
parts, same as for nestloop ? I'm not sure how it'd work, since
initial_cost is supposed to return a lower bound, and resultcache tries
to
make things cheaper. initial_cost would just add some operator/tuple
costs
to make sure that resultcache of a unique scan is more expensive than
nestloop alone. estimate_num_groups is at least O(n) WRT
rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) *
cpu_operator_cost in initial_cost and then call estimate_num_groups in
final_cost. We'd be estimating the cost of estimating the cost...

- Maybe an initial implementation of this would only add a result cache
if the
best plan was already going to use a nested loop, even though a cached
nested loop might be cheaper than other plans. This would avoid most
planner costs, and give improved performance at execution time, but
leaves
something "on the table" for the future.
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+                     Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+     double          tuples = rcpath->subpath->rows;
+     double          calls = rcpath->calls;
...
+     /* estimate on the distinct number of parameter values */
+     ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls,
NULL,

+ &estinfo);

Shouldn't this pass "tuples" and not "calls" ?

--
Justin

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#85

andres@anarazel.de

almost 5 years ago

In reply to: David Rowley (#81)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2021-02-16 23:15:51 +1300, David Rowley wrote:

There might be gains to be had by checking the parse once rather than
having to call contains_volatile_functions in the various places we do
call it. I think both the parallel safety and volatile checks could
then be done in the same tree traverse. Anyway. I've not done any
hacking on this. It's just an idea so far.

ISTM that it could be worth to that as part of preprocess_expression() -
it's a pass that we unconditionally do pretty early, it already computes
opfuncid, often already fetches the pg_proc entry (cf
simplify_function()), etc.

Except for the annoying issue that that we pervasively use Lists as
expressions, I'd argue that we should actually cache "subtree
volatility" in Expr nodes, similar to the way we use OpExpr.opfuncid
etc. That'd allow us to make contain_volatile_functions() very cheap in
the majority of cases, but we could still easily invalidate that state
when necessary by setting "exprhasvolatile" to unknown (causing the next
contain_volatile_functions() to compute it from scratch).

But since we actually do use Lists as expressions (which do not inherit
from Expr), we'd instead need to pass a new param to
preprocess_expression() that stores the volatility somewhere in
PlannerInfo? Seems a bit yucky to manage :(.

Greetings,

Andres Freund

#86

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Andres Freund (#85)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Andres Freund <andres@anarazel.de> writes:

Except for the annoying issue that that we pervasively use Lists as
expressions, I'd argue that we should actually cache "subtree
volatility" in Expr nodes, similar to the way we use OpExpr.opfuncid
etc. That'd allow us to make contain_volatile_functions() very cheap

... and completely break changing volatility with ALTER FUNCTION.
The case of OpExpr.opfuncid is okay only because we don't provide
a way to switch an operator's underlying function. (See also
9f1255ac8.)

It'd certainly be desirable to reduce the number of duplicated
function property lookups in the planner, but I'm not convinced
that that is a good way to go about it.

regards, tom lane

#87

andres@anarazel.de

almost 5 years ago

In reply to: Tom Lane (#86)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,

On 2021-02-22 20:51:17 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Except for the annoying issue that that we pervasively use Lists as
expressions, I'd argue that we should actually cache "subtree
volatility" in Expr nodes, similar to the way we use OpExpr.opfuncid
etc. That'd allow us to make contain_volatile_functions() very cheap

... and completely break changing volatility with ALTER FUNCTION.
The case of OpExpr.opfuncid is okay only because we don't provide
a way to switch an operator's underlying function. (See also
9f1255ac8.)

Hm. I was imagining we'd only set it within the planner. If so, I don't
think it'd change anything around ALTER FUNCTION.

But anyway, due to the List* issue, I don't think it's a viable approach
as-is anyway.

We could add a wrapper node around "planner expressions" that stores
metadata about them during planning, without those properties leaking
over expressions used at other times. E.g. having
preprocess_expression() return a PlannerExpr that that points to the
expression as preprocess_expression returns it today. That'd make it
easy to cache information like volatility. But it also seems
prohibitively invasive :(.

It'd certainly be desirable to reduce the number of duplicated
function property lookups in the planner, but I'm not convinced
that that is a good way to go about it.

Do you have suggestions?

Greetings,

Andres Freund

#88

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Andres Freund (#87)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Andres Freund <andres@anarazel.de> writes:

We could add a wrapper node around "planner expressions" that stores
metadata about them during planning, without those properties leaking
over expressions used at other times. E.g. having
preprocess_expression() return a PlannerExpr that that points to the
expression as preprocess_expression returns it today. That'd make it
easy to cache information like volatility. But it also seems
prohibitively invasive :(.

I doubt it's that bad. We could cache such info in RestrictInfo
for quals, or PathTarget for tlists, without much new notational
overhead. That doesn't cover everything the planner deals with
of course, but it would cover enough that you'd be chasing pretty
small returns to worry about more.

regards, tom lane

#89

Ibrar Ahmed

ibrar.ahmad@gmail.com

almost 5 years ago

In reply to: Tom Lane (#88)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, Feb 23, 2021 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

We could add a wrapper node around "planner expressions" that stores
metadata about them during planning, without those properties leaking
over expressions used at other times. E.g. having
preprocess_expression() return a PlannerExpr that that points to the
expression as preprocess_expression returns it today. That'd make it
easy to cache information like volatility. But it also seems
prohibitively invasive :(.

I doubt it's that bad. We could cache such info in RestrictInfo
for quals, or PathTarget for tlists, without much new notational
overhead. That doesn't cover everything the planner deals with
of course, but it would cover enough that you'd be chasing pretty
small returns to worry about more.

regards, tom lane

This patch set no longer applies
http://cfbot.cputube.org/patch_32_2569.log

Can we get a rebase?

I am marking the patch "Waiting on Author"

--
Ibrar Ahmed

#90

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Ibrar Ahmed (#89)

4 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 5 Mar 2021 at 00:16, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

This patch set no longer applies
http://cfbot.cputube.org/patch_32_2569.log

Can we get a rebase?

v14 should still apply. I think the problem is that the CFbot at best
can only try and apply the latest .patch files that are on the thread
in alphabetical order of the filename. The bot is likely just trying
to apply the unrelated patch that was posted since I posted v14.

I've attached the v14 version again. Hopefully, that'll make the CFbot happy.

I'm also working on another version of the patch with slightly
different planner code. I hope to reduce the additional planner
overheads a bit with it. It should arrive here in the next day or two.

David

Attachments:

v14-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v14-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 61a88b6beaa59b4421c0d6424db47b1c57bd7593 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v14 1/4] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d9d1..a116f637f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3077,7 +3077,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..aaff28ac52 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1874,7 +1874,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index adf68d8790..81fb87500b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4792,7 +4794,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..86e26dad54 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1684,6 +1684,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 47ca4ddbb5..d37faee446 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v14-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v14-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From e34f3827b7bee35df7c8235f9e384f5045a2fc09 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v14 2/4] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v14-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v14-0003-Add-Result-Cache-executor-node.patchDownload

From 4bf55bbe815fb411bd706d384eb4517b301090d2 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v14 3/4] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  232 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2758 insertions(+), 176 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e115d6..8b990f7162 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1834..d4cd137dd6 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4df1405d2e..dee2cc4baa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1736,8 +1736,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4857,6 +4858,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f80e379973..99c1160493 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1280,6 +1282,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1971,6 +1976,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3044,6 +3053,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..d5724de69f 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..41506c4e13 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..3e0508a1f4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -319,6 +320,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -703,6 +709,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4ff8000003
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..15a6a4e19e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -925,6 +925,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -4980,6 +5007,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..2ce54a526a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -834,6 +834,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1907,6 +1922,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3861,6 +3891,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4095,6 +4128,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4388aae71d..c58325e1fd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2191,6 +2191,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2877,6 +2897,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..41725baabc 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4027,6 +4027,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aaff28ac52..38d6ee11f5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2306,6 +2308,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4046,6 +4189,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..5d23a3f7d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,198 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	TypeCacheEntry *typentry;
+	ListCell   *lc;
+
+	/*
+	 * We can't use a ResultCache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget->exprs))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo->clause))
+			return false;
+	}
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* ppi_clauses should always meet this requirement */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) list_nth(opexpr->args, 0);
+			else
+				expr = (Node *) list_nth(opexpr->args, 1);
+
+			typentry = lookup_type_cache(exprType(expr),
+										 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+			/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+			if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			bms_free(var_relids);
+			continue;
+		}
+		bms_free(var_relids);
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* XXX will eq_opr ever be invalid if hash_proc isn't? */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We can hash, provided we found something to hash */
+	return (*operators != NIL);
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean resultcache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1674,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1683,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1853,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1878,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..a564c0e9d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -270,6 +273,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -444,6 +452,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1510,6 +1523,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6344,6 +6407,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -6930,6 +7015,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -6975,6 +7061,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..9584cdb653 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -735,6 +735,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..92ad54e41e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2748,6 +2748,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 86e26dad54..3229f85978 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1547,6 +1547,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3847,6 +3897,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4065,6 +4126,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..07e5698a82 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index db6db376eb..08c9871ccb 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -365,6 +365,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 943931f65d..e31ea90bf7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1981,6 +1982,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..4ef182e3ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -130,6 +131,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -239,6 +241,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..31931dfd8a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1456,6 +1456,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..5f0c408007 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -760,6 +760,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1dd12d484e 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..e9b4571426 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -78,6 +78,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..1eb0f7346b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2577,6 +2577,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2592,6 +2593,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..cbf371017e 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 12bb67e491..715551d157 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -114,7 +114,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 59b416fd80..d343fd907e 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -199,6 +199,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..625c3e2e6e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1090,9 +1090,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v14-0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v14-0004-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From 6bee9c944230ab414c9f07871ffdf9ee6ee84ad6 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v14 4/4] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 4ff8000003..4d6cd9ecfe 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

#91

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Justin Pryzby (#83)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Thanks for these suggestions.

On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

I have a couple ideas;

- default enable_resultcache=off seems okay. In plenty of cases, planning
time is unimportant. This is the "low bar" - if we can do better and enable
it by default, that's great.

I think that's reasonable. Teaching the planner to do new tricks is
never going to make the planner produce plans more quickly. When the
new planner trick gives us a more optimal plan, then great. When it
does not then it's wasted effort. Giving users the ability to switch
off the planner's new ability seems like a good way for people who
continually find it the additional effort costs more than it saves
seems like a good way to keep them happy.

- Maybe this should be integrated into nestloop rather than being a separate
plan node. That means that it could be dynamically enabled during
execution, maybe after a few loops or after checking that there's at least
some minimal number of repeated keys and cache hits. cost_nestloop would
consider whether to use a result cache or not, and explain would show the
cache stats as a part of nested loop. In this case, I propose there'd still
be a GUC to disable it.

There was quite a bit of discussion on that topic already on this
thread. I don't really want to revisit that.

The main problem with that is that we'd be forced into costing a
Nested loop with a result cache exactly the same as we do for a plain
nested loop. If we were to lower the cost to account for the cache
hits then the planner is more likely to choose a nested loop over a
merge/hash join. If we then switched the caching off during execution
due to low cache hits then that does not magically fix the bad choice
of join method. The planner may have gone with a Hash Join if it had
known the cache hit ratio would be that bad. We'd still be left to
deal with the poor performing nested loop. What you'd really want
instead of turning the cache off would be to have nested loop ditch
the parameter scan and just morph itself into a Hash Join node. (I'm
not proposing we do that)

- Maybe cost_resultcache() can be split into initial_cost and final_cost
parts, same as for nestloop ? I'm not sure how it'd work, since
initial_cost is supposed to return a lower bound, and resultcache tries to
make things cheaper. initial_cost would just add some operator/tuple costs
to make sure that resultcache of a unique scan is more expensive than
nestloop alone. estimate_num_groups is at least O(n) WRT
rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) *
cpu_operator_cost in initial_cost and then call estimate_num_groups in
final_cost. We'd be estimating the cost of estimating the cost...

The cost of the Result Cache is pretty dependant on the n_distinct
estimate. Low numbers of distinct values tend to estimate a high
number of cache hits, whereas large n_distinct values (relative to the
number of outer rows) is not going to estimate a large number of cache
hits.

I don't think feeding in a fake value would help us here. We'd
probably do better if we had a fast way to determine if a given Expr
is unique. (e.g UniqueKeys patch). Result Cache is never going to be
a win for a parameter that the value is never the same as some
previously seen value. This would likely allow us to skip considering
a Result Cache for the majority of OLTP type joins.

- Maybe an initial implementation of this would only add a result cache if the
best plan was already going to use a nested loop, even though a cached
nested loop might be cheaper than other plans. This would avoid most
planner costs, and give improved performance at execution time, but leaves
something "on the table" for the future.
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+                     Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+     double          tuples = rcpath->subpath->rows;
+     double          calls = rcpath->calls;
...
+     /* estimate on the distinct number of parameter values */
+     ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+                                     &estinfo);
Shouldn't this pass "tuples" and not "calls" ?

hmm. I don't think so. "calls" is the estimated number of outer side
rows. Here you're asking if the n_distinct estimate is relevant to
the inner side rows. It's not. If we expect to be called 1000 times by
the outer side of the nested loop, then we need to know our n_distinct
estimate for those 1000 rows. If the estimate comes back as 10
distinct values and we see that we're likely to be able to fit all the
tuples for those 10 distinct values in the cache, then the hit ratio
is going to come out at 99%. 10 misses, for the first time each value
is looked up and the remainder of the 990 calls will be hits. The
number of tuples (and the width of tuples) on the inside of the nested
loop is only relevant to calculating how many cache entries is likely
to fit into hash_mem. When we think cache entries will be evicted
then that makes the cache hit calculation more complex.

I've tried to explain what's going on in cost_resultcache_rescan() the
best I can with comments. I understand it's still pretty hard to
follow what's going on. I'm open to making it easier to understand if
you have suggestions.

David

#92

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Andy Fan (#84)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 23 Feb 2021 at 14:22, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

- Maybe this should be integrated into nestloop rather than being a separate
plan node. That means that it could be dynamically enabled during
execution, maybe after a few loops or after checking that there's at least
some minimal number of repeated keys and cache hits. cost_nestloop would
consider whether to use a result cache or not, and explain would show the
cache stats as a part of nested loop.

+1 for this idea now.. I am always confused why there is no such node in Oracle
even if it is so aggressive to do performance improvement and this function
looks very promising. After realizing the costs in planner, I think planning time
might be an answer (BTW, I am still not sure Oracle did this).

If you're voting for merging Result Cache with Nested Loop and making
it a single node, then that was already suggested on this thread. I
didn't really like the idea and I wasn't alone on that. Tom didn't
much like it either. Never-the-less, I went and coded it and found
that it made the whole thing slower.

There's nothing stopping Result Cache from switching itself off if it
sees poor cache hit ratios. It can then just become a proxy node,
effectively doing nothing apart from fetching from its own outer node
when asked for a tuple. It does not need to be part of Nested Loop to
have that ability.

David

#93

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Tom Lane (#88)

6 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Tue, 23 Feb 2021 at 18:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I doubt it's that bad. We could cache such info in RestrictInfo
for quals, or PathTarget for tlists, without much new notational
overhead. That doesn't cover everything the planner deals with
of course, but it would cover enough that you'd be chasing pretty
small returns to worry about more.

This seems like a pretty good idea. So I coded it up.

The 0001 patch adds a has_volatile bool field to RestrictInfo and sets
it when building the RestrictInfo. I've also added has_volatile_expr
to PathTarget which is maintained when first building, then adding new
Exprs to the PathTarget. I've modified a series of existing calls to
contain_volatile_functions() to check these new fields first. This
seems pretty good even without the Result Cache patch as it saves a
few duplicate checks for volatile functions. For example, both
check_hashjoinable() and check_mergejoinable() call
contain_volatile_functions(). Now they just check the has_volatile
flag after just calling contain_volatile_functions() once per
RestrictInfo when the RestrictInfo is built.

I tested the performance of just 0001 against master and I did see the
overall planning and execution time of the join order benchmark query
29b go from taking 104.8 ms down to 103.7 ms.

For the Result Cache patch, I've coded it to make use of these new
fields instead of calling contain_volatile_functions().

I also noticed that I can use the pre-cached
RestrictInfo->hashjoinoperator field when it's set. This will be the
same operator as we'd be looking up using lookup_type_cache() anyway.

With Result Cache we can also cache the tuples from non-equality
joins, e.g ON t1.x > t2.y, but we still need to look for the hash
equality operator in that case. I had thoughts that it might be worth
adding an additional field to RestrictInfo for resultcacheoperator to
save having to look it up each time for when hashjoinoperator is not
set.

We must still call estimate_num_groups() once each time we create a
ResultCachePath. That's required in order to estimate the cache hits.
All other join operators only care about clauselist_selectivity(). The
selectivity estimates for those are likely to be cached in the
RestictInfo to save having to do it again next time. There's no
caching for estimate_num_groups(). I don't quite see any way to add
caching for this, however.

I've attached the updated patches.

It took v14 144.6 ms to plan and execute query 29b. It takes v15 128.5
ms. Master takes 104.8 ms (see attached graph). The caching has
improved the planning performance quite a bit. Thank you for the
suggestion.

David

Attachments:

v15-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchtext/plain; charset=US-ASCII; name=v15-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchDownload

From 67b00cae5c5c207b20cbb24fe6ccc555e2601f11 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Wed, 10 Mar 2021 22:57:33 +1300
Subject: [PATCH v15 1/5] Cache PathTarget and RestrictInfo's volatility

This aims to can reduce the number of times we make calls to
contain_volatile_functions().  This really does not save us much with the
existing set of calls to contain_volatile_functions(), however, it will
save a significant number of calls in an upcoming patch which must check
this during the join search.
---
 src/backend/nodes/copyfuncs.c             |  1 +
 src/backend/nodes/outfuncs.c              |  2 ++
 src/backend/optimizer/path/allpaths.c     | 41 ++++++++++++-----------
 src/backend/optimizer/path/indxpath.c     | 10 +++---
 src/backend/optimizer/path/tidpath.c      | 12 ++++---
 src/backend/optimizer/plan/initsplan.c    | 10 +++---
 src/backend/optimizer/plan/planner.c      |  8 ++++-
 src/backend/optimizer/util/orclauses.c    | 11 +++---
 src/backend/optimizer/util/restrictinfo.c |  1 +
 src/backend/optimizer/util/tlist.c        |  7 ++++
 src/include/nodes/pathnodes.h             |  4 +++
 11 files changed, 69 insertions(+), 38 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index da91cbd2b1..493a856745 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2310,6 +2310,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_SCALAR_FIELD(can_join);
 	COPY_SCALAR_FIELD(pseudoconstant);
 	COPY_SCALAR_FIELD(leakproof);
+	COPY_SCALAR_FIELD(has_volatile);
 	COPY_SCALAR_FIELD(security_level);
 	COPY_BITMAPSET_FIELD(clause_relids);
 	COPY_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6493a03ff8..73dd2255af 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2473,6 +2473,7 @@ _outPathTarget(StringInfo str, const PathTarget *node)
 	WRITE_FLOAT_FIELD(cost.startup, "%.2f");
 	WRITE_FLOAT_FIELD(cost.per_tuple, "%.2f");
 	WRITE_INT_FIELD(width);
+	WRITE_BOOL_FIELD(has_volatile_expr);
 }
 
 static void
@@ -2497,6 +2498,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_BOOL_FIELD(can_join);
 	WRITE_BOOL_FIELD(pseudoconstant);
 	WRITE_BOOL_FIELD(leakproof);
+	WRITE_BOOL_FIELD(has_volatile);
 	WRITE_UINT_FIELD(security_level);
 	WRITE_BITMAPSET_FIELD(clause_relids);
 	WRITE_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..5ac993042e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -134,7 +134,8 @@ static void check_output_expressions(Query *subquery,
 static void compare_tlist_datatypes(List *tlist, List *colTypes,
 									pushdown_safety_info *safetyInfo);
 static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
-static bool qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+static bool qual_is_pushdown_safe(Query *subquery, Index rti,
+								  RestrictInfo *rinfo,
 								  pushdown_safety_info *safetyInfo);
 static void subquery_push_qual(Query *subquery,
 							   RangeTblEntry *rte, Index rti, Node *qual);
@@ -2177,11 +2178,12 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		foreach(l, rel->baserestrictinfo)
 		{
 			RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
-			Node	   *clause = (Node *) rinfo->clause;
 
 			if (!rinfo->pseudoconstant &&
-				qual_is_pushdown_safe(subquery, rti, clause, &safetyInfo))
+				qual_is_pushdown_safe(subquery, rti, rinfo, &safetyInfo))
 			{
+				Node	   *clause = (Node *)rinfo->clause;
+
 				/* Push it down */
 				subquery_push_qual(subquery, rte, rti, clause);
 			}
@@ -3390,37 +3392,39 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
 }
 
 /*
- * qual_is_pushdown_safe - is a particular qual safe to push down?
+ * qual_is_pushdown_safe - is a particular rinfo safe to push down?
  *
- * qual is a restriction clause applying to the given subquery (whose RTE
+ * rinfo is a restriction clause applying to the given subquery (whose RTE
  * has index rti in the parent query).
  *
  * Conditions checked here:
  *
- * 1. The qual must not contain any SubPlans (mainly because I'm not sure
- * it will work correctly: SubLinks will already have been transformed into
- * SubPlans in the qual, but not in the subquery).  Note that SubLinks that
- * transform to initplans are safe, and will be accepted here because what
- * we'll see in the qual is just a Param referencing the initplan output.
+ * 1. rinfo's clause must not contain any SubPlans (mainly because it's
+ * unclear that it will work correctly: SubLinks will already have been
+ * transformed into SubPlans in the qual, but not in the subquery).  Note that
+ * SubLinks that transform to initplans are safe, and will be accepted here
+ * because what we'll see in the qual is just a Param referencing the initplan
+ * output.
  *
- * 2. If unsafeVolatile is set, the qual must not contain any volatile
+ * 2. If unsafeVolatile is set, rinfo's clause must not contain any volatile
  * functions.
  *
- * 3. If unsafeLeaky is set, the qual must not contain any leaky functions
- * that are passed Var nodes, and therefore might reveal values from the
- * subquery as side effects.
+ * 3. If unsafeLeaky is set, rinfo's clause must not contain any leaky
+ * functions that are passed Var nodes, and therefore might reveal values from
+ * the subquery as side effects.
  *
- * 4. The qual must not refer to the whole-row output of the subquery
+ * 4. rinfo's clause must not refer to the whole-row output of the subquery
  * (since there is no easy way to name that within the subquery itself).
  *
- * 5. The qual must not refer to any subquery output columns that were
+ * 5. rinfo's clause must not refer to any subquery output columns that were
  * found to be unsafe to reference by subquery_is_pushdown_safe().
  */
 static bool
-qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo,
 					  pushdown_safety_info *safetyInfo)
 {
 	bool		safe = true;
+	Node	   *qual = (Node *) rinfo->clause;
 	List	   *vars;
 	ListCell   *vl;
 
@@ -3429,8 +3433,7 @@ qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
 		return false;
 
 	/* Refuse volatile quals if we found they'd be unsafe (point 2) */
-	if (safetyInfo->unsafeVolatile &&
-		contain_volatile_functions(qual))
+	if (safetyInfo->unsafeVolatile && rinfo->has_volatile)
 		return false;
 
 	/* Refuse leaky quals if told to (point 3) */
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..8c447cf0a2 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2502,7 +2502,7 @@ match_opclause_to_indexcol(PlannerInfo *root,
 	 */
 	if (match_index_to_operand(leftop, indexcol, index) &&
 		!bms_is_member(index_relid, rinfo->right_relids) &&
-		!contain_volatile_functions(rightop))
+		(!rinfo->has_volatile || !contain_volatile_functions(rightop)))
 	{
 		if (IndexCollMatchesExprColl(idxcollation, expr_coll) &&
 			op_in_opfamily(expr_op, opfamily))
@@ -2531,7 +2531,7 @@ match_opclause_to_indexcol(PlannerInfo *root,
 
 	if (match_index_to_operand(rightop, indexcol, index) &&
 		!bms_is_member(index_relid, rinfo->left_relids) &&
-		!contain_volatile_functions(leftop))
+		(!rinfo->has_volatile || !contain_volatile_functions(leftop)))
 	{
 		if (IndexCollMatchesExprColl(idxcollation, expr_coll))
 		{
@@ -2723,7 +2723,7 @@ match_saopclause_to_indexcol(PlannerInfo *root,
 	 */
 	if (match_index_to_operand(leftop, indexcol, index) &&
 		!bms_is_member(index_relid, right_relids) &&
-		!contain_volatile_functions(rightop))
+		(!rinfo->has_volatile || !contain_volatile_functions(rightop)))
 	{
 		if (IndexCollMatchesExprColl(idxcollation, expr_coll) &&
 			op_in_opfamily(expr_op, opfamily))
@@ -2805,14 +2805,14 @@ match_rowcompare_to_indexcol(PlannerInfo *root,
 	 */
 	if (match_index_to_operand(leftop, indexcol, index) &&
 		!bms_is_member(index_relid, pull_varnos(root, rightop)) &&
-		!contain_volatile_functions(rightop))
+		(!rinfo->has_volatile || !contain_volatile_functions(rightop)))
 	{
 		/* OK, indexkey is on left */
 		var_on_left = true;
 	}
 	else if (match_index_to_operand(rightop, indexcol, index) &&
 			 !bms_is_member(index_relid, pull_varnos(root, leftop)) &&
-			 !contain_volatile_functions(leftop))
+			 (!rinfo->has_volatile || !contain_volatile_functions(leftop)))
 	{
 		/* indexkey is on right, so commute the operator */
 		expr_op = get_commutator(expr_op);
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 0725d950c5..e40df11b19 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -84,6 +84,9 @@ IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	/* Must be an OpExpr */
 	if (!is_opclause(rinfo->clause))
 		return false;
+	/* Must not contain any volatile functions */
+	if (rinfo->has_volatile)
+		return false;
 	node = (OpExpr *) rinfo->clause;
 
 	/* OpExpr must have two arguments */
@@ -111,8 +114,7 @@ IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 
 	/* The other argument must be a pseudoconstant */
-	if (bms_is_member(rel->relid, other_relids) ||
-		contain_volatile_functions(other))
+	if (bms_is_member(rel->relid, other_relids))
 		return false;
 
 	return true;				/* success */
@@ -178,6 +180,9 @@ IsTidEqualAnyClause(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
 	/* Must be a ScalarArrayOpExpr */
 	if (!(rinfo->clause && IsA(rinfo->clause, ScalarArrayOpExpr)))
 		return false;
+	/* We can safely reject if it's marked as volatile */
+	if (rinfo->has_volatile)
+		return false;
 	node = (ScalarArrayOpExpr *) rinfo->clause;
 
 	/* Operator must be tideq */
@@ -194,8 +199,7 @@ IsTidEqualAnyClause(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
 		IsCTIDVar((Var *) arg1, rel))
 	{
 		/* The other argument must be a pseudoconstant */
-		if (bms_is_member(rel->relid, pull_varnos(root, arg2)) ||
-			contain_volatile_functions(arg2))
+		if (bms_is_member(rel->relid, pull_varnos(root, arg2)))
 			return false;
 
 		return true;			/* success */
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..9914d230ed 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2652,12 +2652,13 @@ check_mergejoinable(RestrictInfo *restrictinfo)
 		return;
 	if (list_length(((OpExpr *) clause)->args) != 2)
 		return;
+	if (restrictinfo->has_volatile)
+		return;
 
 	opno = ((OpExpr *) clause)->opno;
 	leftarg = linitial(((OpExpr *) clause)->args);
 
-	if (op_mergejoinable(opno, exprType(leftarg)) &&
-		!contain_volatile_functions((Node *) clause))
+	if (op_mergejoinable(opno, exprType(leftarg)))
 		restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno);
 
 	/*
@@ -2689,11 +2690,12 @@ check_hashjoinable(RestrictInfo *restrictinfo)
 		return;
 	if (list_length(((OpExpr *) clause)->args) != 2)
 		return;
+	if (restrictinfo->has_volatile)
+		return;
 
 	opno = ((OpExpr *) clause)->opno;
 	leftarg = linitial(((OpExpr *) clause)->args);
 
-	if (op_hashjoinable(opno, exprType(leftarg)) &&
-		!contain_volatile_functions((Node *) clause))
+	if (op_hashjoinable(opno, exprType(leftarg)))
 		restrictinfo->hashjoinoperator = opno;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 424d25cbd5..20adb77ccc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5903,7 +5903,13 @@ make_sort_input_target(PlannerInfo *root,
 				col_is_srf[i] = true;
 				have_srf = true;
 			}
-			else if (contain_volatile_functions((Node *) expr))
+
+			/*
+			 * We need only check if expr is volatile if the final_target has
+			 * any volatile functions.
+			 */
+			else if (final_target->has_volatile_expr &&
+					 contain_volatile_functions((Node *) expr))
 			{
 				/* Unconditionally postpone */
 				postpone_col[i] = true;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index d559f33826..d9f6c44079 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -133,17 +133,18 @@ is_safe_restriction_clause_for(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	/*
 	 * We want clauses that mention the rel, and only the rel.  So in
-	 * particular pseudoconstant clauses can be rejected quickly.  Then check
-	 * the clause's Var membership.
+	 * particular pseudoconstant clauses can be rejected quickly.  Also,
+	 * checking volatility is cheap too, so do these before checking the
+	 * clause's Var membership.
 	 */
 	if (rinfo->pseudoconstant)
 		return false;
+	/* We don't want extra evaluations of any volatile functions */
+	if (rinfo->has_volatile)
+		return false;
 	if (!bms_equal(rinfo->clause_relids, rel->relids))
 		return false;
 
-	/* We don't want extra evaluations of any volatile functions */
-	if (contain_volatile_functions((Node *) rinfo->clause))
-		return false;
 
 	return true;
 }
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index eb113d94c1..f1d068c2fe 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -137,6 +137,7 @@ make_restrictinfo_internal(PlannerInfo *root,
 	else
 		restrictinfo->leakproof = false;	/* really, "don't know" */
 
+	restrictinfo->has_volatile = contain_volatile_functions((Node *) clause);
 	/*
 	 * If it's a binary opclause, set up left/right relids info. In any case
 	 * set up the total clause relids info.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 89853a0630..9cf9d45347 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -623,6 +623,9 @@ make_pathtarget_from_tlist(List *tlist)
 		i++;
 	}
 
+	/* cache whether the tlist has any volatile functions */
+	target->has_volatile_expr = contain_volatile_functions((Node *) tlist);
+
 	return target;
 }
 
@@ -724,6 +727,10 @@ add_column_to_pathtarget(PathTarget *target, Expr *expr, Index sortgroupref)
 		target->sortgrouprefs = (Index *) palloc0(nexprs * sizeof(Index));
 		target->sortgrouprefs[nexprs - 1] = sortgroupref;
 	}
+
+	/* Check for new volatile functions, unless we already have one */
+	if (!target->has_volatile_expr)
+		target->has_volatile_expr = contain_volatile_functions((Node *) expr);
 }
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 86405a274e..4526ae4297 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1087,6 +1087,8 @@ typedef struct PathTarget
 	Index	   *sortgrouprefs;	/* corresponding sort/group refnos, or 0 */
 	QualCost	cost;			/* cost of evaluating the expressions */
 	int			width;			/* estimated avg width of result tuples */
+	bool		has_volatile_expr;	/* True if any of 'exprs' has a volatile
+									 * function. */
 } PathTarget;
 
 /* Convenience macro to get a sort/group refno from a PathTarget */
@@ -2017,6 +2019,8 @@ typedef struct RestrictInfo
 
 	bool		leakproof;		/* true if known to contain no leaked Vars */
 
+	bool		has_volatile;	/* true if clause contains a volatile func */
+
 	Index		security_level; /* see comment above */
 
 	/* The set of relids (varnos) actually referenced in the clause: */
-- 
2.27.0

resultcache_v14_vs_v15.pngimage/png; name=resultcache_v14_vs_v15.pngDownload

�PNG


IHDR���>sRGB���gAMA���a	pHYs%%IR$��IDATx^����%Wu�[�Q=���pOGT=t�KO�/����8���6n�m(l�
�m�M�X��\mds�U�N	��Z���$��$@H%�n��o��Q����s�5�2���B�j���3�e��_���ul���pa-�_������zB|AO�_X3�	�=�K|a�``��'��d.�u�
�B!��]a�B!��q�,��z@|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_��s����?�:v�����S�Kw���O��%�={�p),�)&�����L��F\�xqu�����{��gu����r������\��&s�/�?,�)&N�?0����%�0������b�!���\����NN������>3����v�J:p=�4�i���z�fZ?���O,� z��������&�(�8��A'�����24q�\e��4�[:S��J�,�1��dp}z�X#���;c�<h�����.qc��t1/��]z1c��\�k����Co��t����`Y:��|zQ��~�)�4H��eh���l�]w��.��D/�_�[}��|���yPX]2�9v��X��mZ��'s1f���%����:?���/8~�~0��oSD}^�}?�XY}|z�V�)_����`h���xi���y��]���f�}�-�?[V�}��>�����;����4�ey������|��W:J��f.�v����������
G=@k����f�Ci��?.�yX��P>0�&N??����K��O����Y>�l���%�o�����lG���>gy����������Q��7s1f���%�����

$~p���"H�����!���`h��s�7��y�4w���o-3gesM��Lq������e�����:of�dl�.S0c��\�k����b�������� ��ATdF��'Z0�C�7�~�*��������1sVi����l{_�!Y�Z��Z�����f�N�&�2s1f���%��_������0d���6�EZ0�C��t�5��y�4w����Y�������v�95�2��f�n�7�v26i�)��1��d.���{�� 5��e��QK�Yt���y04qf��(����TJ�2g���_n�_��-�`�����u���}?�W�+#k������n2��4���� ������o��1��(
>� ����Ky�������������,�]���/�q�)��"��|]|�YD���y��/�C��c�M�~�]}�,��v��\��&s�����4f�4��[�
&��D���L��m��ah����E?����J����y�Z�;���6��������K���b(�X����~����.=��1��d.�U�����P�p����������
Jc?��@����/������U~�*���UJ���L��F����\�������u2�y�3f42���=����|��.���1��d.�u���#3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_X3�	�=�K|a�``��'��d.�u�
�B!��]a�B!��q�,��z@|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_���/�N�<�:v�������K��>}z�V�N���?��t���'�0f�����0�`:u����a,�{����(���#��'s��A������s����?�{����`����e��C~ �p��o����<Q�<�/��\��h�O�8�6�������k} a��'�yMv�tQ�B���s�/4�a`�����Sz�?_�����d.�U4�:8�����A�Q'<�@� �&��c6?�`KC��}4�f�KW���K-@eh��=�K|
�3g��(T��v�{e�8����^���g��zC�V��_����`�����6g�\�_K�����C����������&c�lm��`�a.�%�������x�������nyhy&�_f�Mv�filv��R,�>+����4q��y�1�7��Wd����x���w�W�#b�j;�0��2���_�����������8�n����;;�-��/;�-�7��\v������tp+��X�y��������<n������Dy_�!��'s��A����L|4�f�Kz����}:����W"���<��SV��yS����������jyd���}������%������(�P�1���t�`�v�<��d��{H>}6��2����m�>�Z�1�����a<��d.�U5����2��[����nW�h���7��1���-0��c6o�a��s���<<��������#��'s����f�uP��Lv�{#���p%�m�����UdJ`���l����!��i~�<m>���y?��� ��'s��Q������o�N�����c���A���\6@���`�`�����8��~N7l7_a��vHc��C��d^�����%�F[f�?pK�?���p�o<�3+���?V.���\0f�&��y�l^�~!#�C��FL_Z.�/��\�k��v�h��������$����������)�|�?�r����!�/�GD���D���o-��$z�� ��'s����?��.Jzf��������W6p�t�v���c��`�`��M�x�7c�E��7���_�
�z2�������������%�0��0f��z2�����"��AO�/��\���c=!��'s�/�?,�������%�0��0f��z2��:fA!�B��0�!�B�����M\= ��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�������W'O�\;vlu�����eN�>�N�����_]�t���n�6U_�O"j�{��g��Q�mJ�;wnu������S�N���P[��m����S�c�+N[��c6o�?��+��?�����?~��G|m������/�=����Y��U�(V,]����
�O�%��v`x�AY���X���K��R��`�j9`���eY�Y?���
�-i=k������
�l��y���1W�����)	�������+��yPc����jIkqd�U��Lc/>�%���_W�U��l�w�]w���n[X�����3�R0Y�����[������N4b��r���U,_KZ�J�oyo�0_0f�$���l:W;��o�|��A|m�������`�b����|������,�1i��_fe���m�x��%���_���{�������<�n�1��_�Z�B��`V�{��O�Q������7-����;+C��>�
��`v���%���i;�m'���1�6/HC[�m?����|m{�k;X���@}������H����iD���V��>m)M���\�k���9s&m ��Z����;W��C����%�?���zC�V�����t�k�����'�g��r�S)��.c����(�g��q�P������g}��a��`�����m�18��G�3��K��Q�����vj�8O��tl\����J-iE�Hq{����\�k��_8(��.u��g�����|~�}&�,����&��b��Yiu�����0a�����61�"��J��['�7.&k�>���A(b�ju�uoI�Q�^X����|0f�&�b��_���2�y�lC�����3�H�_���_����Z���P�Y��0��4��[���������n��	K������������U
$���BV�Y���r������B�g*��QT����\�2e}�����6���j3X>�yS��6�mNR�Cs�67��'����S�������1b���^��������%���V�9�5�os��A����N���f`=����}��g�r	5�:!�����c�8����C�����-T��A(���������(�������e*���ulI�Q�����c6oJ��&���I6�J5��<�y�C|m���W~������F
�k�����jI+,v�:����i�R|�%���?0��>[���%6�-��<�\�,^>��\�?#�-����
���%��Z?&����t��=�L�z�z����fB�i�cX�yS�W6�
��u���\�����S�yT|\����}j���V�aq��#)O{1��������:�/���rg�6�����5~��}�z�����vPF�<c���{HVn1vp)q��c�6���>�MeTY�(�ah�X��O<��Z�f��7,����	�l�������������q����.�W&�G��{���2�[?���������*���y&+����"s������c�h�����w�?��AJ��;�6+W��!T���K�Q��e\�V�t�Me���Sj�����n��q]K������&}�c6oJ������u�;'
��?q��U�}����16�y�MX��P>s��Q��*v�}����(v�U:;�-�1�L�����et��C6�M��mU���C���c�jJ�7����3�F���b}����6����8j��������"�1W���uCe�_����������+�����6Ce�>��j��:f^e.�5���2U.vP�`�tLg������o���/����;��5������V����M��� ���r���>�m"��Y�g����-��c=[�F,����P�������;2��2�gs�0#�2������v����5����J�3,.�?�� Z�F,�������y����(�/T	5V<��s|GZ���w(�o8?@H�?kt��)vb��m|��Z<��C�����P��.����a��~KZ��_�K�����c6o2?���\!�9n�\V�G#���(�_6��u�%��>���1����h���c�������%�F��*�)��l�iRZ[��[^Y�t���~�X~���X�cfX����n�uSy��Q9}�Y�V7�P[����X{�_�=0f����^sE6��c�Q��]Z�z��Q�:�k�G���5�?�7Kg����\��6������T��X63�	�=�K|a������2g���`7��AO�/��\�����U���f�W�v�������%�0���Z�_����[K/?�c=!��'s�/�?,�������%�0��0f��z2��:fA!�B��0�!�B�����M\= ��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�������W'O�\;vlu�����eN�>�N�����_]�t���n�6U_�O"j�{��g��Q�mJ�;wnu������S�N���P[��m����S�������6;~��������U��S�/c6wl�+��884������s[��++#�5�Z_��o���kI["��X�`�eu�s|����
���K|
Uv������
o�����oiX ��?�W���-����9��l@oI��X;J��*_iR����V�uR][�`_���?od�D��ec~T��xJ������G���<���sK�!J}]�K�Ni�6�������Y���c����W�����*�u���������6V.�S��e�I,��{P�P��}X���m��Ao�}�W�|-i=*���2��W��`����OJF����1�40f����R��z����4�*����,�1i
������oF����-iK(F�����������'Kg�ZZ\����ch�\��h�U�{�����20:9x���
�
|�jA�u�X�v,������7�����,�����Y�v��v`��Yh[��������m�D���{���"���/�����2Cm��`������@��x)l|�c{iN(��C�,A|
3��Ei���EK���t>�<>������7��e��K|
�3g�]��4C�oAb�BLR�g\/��Y���`�O�P�R@dh�R{d��.�g��r�S)��.c����(�g��q�P������g}�=�������"q�m/��1�O�7�n�1�6��<bsX�����6�i��dcq�����W�m��h�Qc�Z��X�a��v������U�<�
�_����A��5�����|~�}&�+������K������zli�v�����Q�&P��Xi��c�$���dmV����o��N�2������6��}F��
�7���������2�,-��u0f�&�}c���Pc�a�_�������%�,=��z������J|��(}m�����8���?*���c�Z�m.���[��\(���s��A��������5��jD��0 K�n_.�H��/�\y�/�����5�����P�[[h�L�<=�`+J�E,S�'�^-i3j�g�,�Pyc�*���~2����`���������q.mL�c���cP��&���#��x������>�����GQY_��le/��b����6'�����%����*�Q�jk�R�e(?�k����F,������>h����
����.%�r�e�����X����C��~��}��L����-i3j�g�,�Pyc��g���_q�a�k����1�7�y���Q����+�4^F��5�k���
�k<�}]�3��V��w��m�O;T�_��z?�+������_�\��j�UIU������3b�����q�'�K�/��W�K�g��eR�T�!�~��D�\�������.��'��V_���c�,��|C���bL�D,�l�>�1�7-c��P���\��_�k�k���}�/�3���OK_[����iI+�b"�k��Wc�"�/(�����������W������*5���b�k�,��3�K����;|�������6�m��},*����Q*����LY�x|{���(mo���Mj���*o��!�����`����1K���Y��8���Z�'��K��R���������4�j��%�a��%�kI�2����t�,O�G���<s�������*�;%k0K/���$;���;�6+Wkg���u�X����V�t�Me���Sj�����n��q]K���o��[��T�������3�.2��nL>��l��g����4bh���c�����������y;�%��<Oi;��*��1�R��6/[;�g�Yy�_���U����[�SC�����$�w�a�u�5fV�Z�GJ�X6���U}bf�cl����(����f���Q=��X��z���d�?�^����e��Mi���}��~�G0f����l��1|��6f\��a������5�����q��hI�������;C*�����qe�`�j �����(�o��a��;�*��y�rk�R�Y~yV.��0V.�\�:u�M�������m2�n��_��e�}d�D�=�-����l�����6b�|���W�|���O��e�]��6���i���ycc�
��y����_iyF6��m���P_�:��%Z���������ky��cA�������s.�5��U"6���:�$���
�
�d��F�������������m�a�SJ;�n>(K���(����FT�-i=�~1.���|�QY���-����Q���`��M���<m��fi��|������l���J|���������}KZCq���|��8R�KD���c�����/�1��m��2��L���Kim�Ooyxe���rd��/����{l���P��vk���3����������������(��Xz�/NhC��Jk�j��l���@6�Ge���9nh�����xN|����6W�6���[����oi��=��xlN�e1b=�4b.�u������u����(����������%�0�{s~�	����}c=!��'s�/���������`��'��d.������k'N�H���i���=���zB|AO�_X3�	�=�K|a�``��'��d.�u�
�B!��]a�B!��q�,��z@|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_X3�	�=�K|a�``��'��d.����E�1��_������zB|AO�_���/�N�<�:v�������K��>}z�V�N���?��t�����Fm��P��P��s�=��,
���1�0��c������`�0
mO|�gL_k]��O�:uK�Lc�)���8<w������7��|������_��_�U�K�N��G���R�@T�U��c�,
��|1�3�x�1&���P2��y��������0��<Q���8�u���g�X~�S���W���8qb]�R�[C�u�]i�m+W�A�T��1�dm36=�����h������_u�J��hf�$����k�����Xc��b����3�Jo��Y�,�l{�Z�d��3�Ns����WE����b��Str���
�m|�jA�uvv��-|�j?j�x�d�\w�}�Zc�,������^4S7���(���M6_y	�k��}�eZ
�X3?��������>�|���f���K|
�3g�^��4��������&)�3������K�
�[uS~����"C���#�t1?���[�jeu��c�,����y��1������m4��I�u����??8�zl��>Z ����#�N����%�1>+�Ec�P��G����c�`.�5h�/P�N�5�����|~�}&�+���Y ��R,�>+���[��&l{�1����6V���<��1���l���G�lN����]�ks�'�����������k�ymc�.��k����E����C[n��})O��c�)�_��_��X1�Y����F��e�b��q�/�u�$��R�)���J4D�>����:�U���c�,�����k�����6�I�;�%,_�%�4O_����a}��c�����aq����P���8����j��e���S�%���M�R��ao8�v��N�m�s�?#�K���Xz�0��[��>$�K���Be:b[E����j���l�����\8�^��Y�K3,_���2h���5�����O�����D�Z}�y���-�qd��q1T�������|�T���W��[��Ab����������u�X.a��O�2����e��2�lC��fu�h��g����b(=�.�1�7C~@���[J�4J����7
�7���J�k<c�_�������������x)/{C���%K��������������W����$V��`�8-�]�f�|��\��=$+�h5���n�(�aFlw�����0f���4gj�����6���^d���[2�����O��k(}i;[7��JX�h{����K-q8��YZ�Vf+G{s�����3�����K����8a�������Y��},G�>�=�a��UL���,���l���d�	J��U��Y��y���~�S�������/��]��l��iq��%�����c��>=s��Q��*v�}���yS;�*�u��]kXad����vpG�j��M��mU����v�vP���?���������e�z��+��3�5l]�[�5�R_��3������~��<%of�d~�%K(_Fk��y��s��Q���Y���J[�[�c:0[�Z���-�<+��W4�V.��c�d{+��e��M��MuV�k��X�y�@�����L���<� ��3�w�.���O����X�>6�����x����<����,��_���P%���:�$�Ci}�Y�l����og���q�_�,�#<c����Kc6oZ�[Kz�C�7)�}����Wi�$��S�;����${�f�<�����������8��K�g�-��s�����*���(5�u����r�����:������X~���X�cfX�JT�5=�������(����Mjs������������m?�O,�����,��jyc���i�Y�����%�n3�sc=!��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�/�?,�������%��YAB!�B�+�?B!�B{ n��E`@�/��\���c=!��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�/�?,�������%�0��0f��z2�����"��AO�/��\���c=!��'s�����<�B!��yR�07�"=!��'w*�0�{�`����=��`���?����0�{�`����=��`���?����0�{�`���?y�����l��{����1��{���>;H�����`���?�����������������/�����A�)_�������^z�p�������R��k�����^���?>\����?�3��0^x��u����o\��o����~�7s��7�y��#��._�|������}�c�7��
���?��Z�[��������������}���������`����V0���������g%��MoZ��g?;��v��?�z��^�n+)�����n�*�y�<<-�����`����V0���G?���w�w��U����~��_n�*c��7�?���V����������_0����_�[��/����E}%���X=��S��W���_�ti���~v�?�����F��xdem�N���/��5�W����C��������`��s����G0���R0��Cp+�ep�������������i�#�J���J�����_|�p�j���}����L}v;�n�QL�����������C
�9`n������_�Uz��S�Z/�����:Y�I�����o_���+�k0���`����2������p��w����${��G>���vZ�|�>�[�V��i��v����_���a3��X�z�{�{�W�GAeQY�������e�	����*�=��{��{�O|��\Q�W����gn�Gu�����W����V[�������7��)����8�Mm��O�����|���>#���������,���=�y�z5���?��u{���m��������S�N��[��ko'����v�h�oU�^�)_�Q�-���P,+�����G�e{�F���9�*�����[�Y�������cc�f�tUXW��Fe����ac���������b}��T��X���	�Vq�c��OeP_����U������W���w�q�4��b�_y��W�r�t{��#���+��
���b�?+�Y�T,�s���yh���|�������HV?)��?��?\}�s�+�s<�3�{����������X,���u��}dc���r�K�[�z��/����������O����C���������qc��jc�o�����'�X��V�-���n��m���7^'1������t[0G������?��?������t�����lCeR�l�������k�+�_W��@7�{c�y����r~�I����[��z��K���"��������V�?��?Y��;_�������eR�4���Z�	w��5���?�Sz;�����|��.���&��/��*Y����
��s��
��������'�OV�����K&������	S��v���Ca�b�����y��g76�j?�C�j�����7�G��X��M��&���s������d�����e��~����oG+�v�D�mW�u|����>\�j�����_���8����y�������6�)�����:*N#���I����LZ��O~��q�������Oq%=4��$���b�yxh��4�~�X������!4���+����o���@Rg?�o2j~H��b�>c��?�E
z
d�!����p^�����W�����2�f�����1��o�������O�q�4Y{{�T������TR{�m�|CX\��j���W����v53*)���I��^�k���oL��d�|�Zh����2>��c�[�����
��X��Z���I�����$����2�*�gl�[:��7H�C+�QM��x�:}�dm��+�tT���1��\Wc}��������������&��6�c�ND��gm�2~�[�:\z;:^l�QZ������G^j'��c@
_����I�#�yd������Wv��w�y���r��A��z���?�|u�����-���������������V��X���;a�5����EA��A�}
�����e��#�<~���iJ4��9s��i����e����L��Uy������N6�f��7Z���~�F����7`�~���w�2���V:�������������x��b�j��U?�������j�����������m���J}]j������m���������`��S�o{Im&Si�������W���������[�T�Y�F��@���\Z�m�����\����?|�K����Y;)��^�W[i�g*�G?�����x��L�h��jo}M��3�Cu��P>q2����ARYl8��1^���w��Wm��lV?i�<_���k�c0����m�-��N�5(�����q\��_&X��'w�?1����x�������6.j���q�S^��c�����G�U�x�e�y��f6�������3?~H1&c��2��qHc��c��P�y��s�c�:���%���}cP����w�[��������[��8�K�gU^���"�Z�cD���i����Ie�q&a����*��� �W_:@"��u����cM�@"��}%.����]}�n��>4���'a?���d��O�����'��by���W��_����3�cv��|�fF�?�dW6���nP�xs�A���q��G�M*�
���o{�-
�������`W�J�"t����/�oM6j��7����c����G������ON�o���/��/��c��C���!���R{����tV�d�����z�&���~����������Q��6����2A%��Tu��C��l��Vm��%�����Wq��:�d��5�������aK�����c^�~��W�K������:O����Q�-�����u�k�������16������z`��7�E�g
T/��>t\J:����{�fg�6�3#����X��������h�#~��;�-��'�Rk��Qih��h�2SZ�X
�������'��~�p���`�3}?XD4y�C}��5��~��S�� �������4��V,.�<����0�oQ��D|����������4q�~�_���O���N?��>��_2oC��G�WK3��q4&���S���M���~���
�U����3T��	�����6�A������7�����'��������W����rI2A��b�}���jO����7��c^v����.P��K��X|����;�<��qh���c�,D)N��X�mR�K�CcL�/��6�N���W����!�=o��=d�t4Y�Cbc�8Xzt���q��1�6Ii2��S�J�M��JT�����������������6��E���v�>�'�c�am�>��E��������A�O,]�S�O%C�������F���+�:v�M��m����<c�����o�P����Q�&�f����i=b[n������'���.����v����U���������A�;�-�l�>+������]	6����'��m�.b��c�q.�&�FL'm��X���B�|��:��<7�|*��6�p(&7-���o�|GE���&�����b����{\*?;	����4��U�g��,��+`-x�\���������q�o
8x�����M^q��Op������c{[����2�M�`���!�q�>��h�QN�D��[�%�������}�c���������(�u���hzR��������o_�ac�����lZ�������-g�������n+�}c6�������RW1-�d��kK�i�k����1TC��~���r���I>>l���\�H�?�5�C����7�>]��Z��R����+�}��X����D��S�������{\�t�=J������C�{`�:�6pj�}���������~���:�M�`����K�m/J���/���-]�S�����������0�S�Ly��m��u�w��k��������r�=��0������,cL;���Q�Q�����U��u��K��\�v[�QnQ���TW��~�
�������
��A����m�RL(������AuS�x���UO}3�������!��/�Pl�t1�K����p(&7-��q�lb����o�M��7��������)=�k�*���=.}:���g�oG��!�=fa�5���i
����4X�����i����N
���1���(}]-�����J�m��gZ��c(�3�������^GeL������W��1�H��|���Cq�c#��S�����z����a��8jmg]���5!�I-=�W��5�<4��G�_�q�>��Ex�l�����
X+��K_��M���K�l[��?�����G'!��G|������<���~�N���-�z9�=���7�$-;	�se<�}�Jc���b\���tC������c����������_�j�������[�b[�=.}:�-����N:���e���8�C�V�@WY��m�p�Q��n�k��WuT�8�[J����CW������~R:����������'B�p��|o���1��������~�r3_����A�7b�b��&��	�����~��cT��[G�xm���!P����w���+�V�X{��T��?�����184��0�ou�C���Q������m��vQ�h��{����!�����
�����1�r�1{�7���q�����V>&�q��W�O�:+_����=�1����~�C��}��{\�t�=��1q�����������3����������aM�:���:���
=�1��dHT'�:�XN���(���j���zj?�Km�tq��'Oze������������l�Jm�eZ�4j?{�~��^�V������/`�i{a��`�'�R�-������ ��t�OK��G������du���=�fq!)/���,Se&I��=jEF�����<��k�z����t����v��zl��}2d0<~��Uk3��U��1�����`���v���
E�:�����J��W����E��]���b[��~������8������s��R<�}�Jc���b\g���tq��M������?�K��8�k�-]l����O��N�U�'_�q'
���{D_G����4�����+`�0����x�F��X��~�K'~��9���_I�TV�YeW�>��W|���i��1E���(J�����#��Oe�'P�7���;�U^�����
����w��m�1�$bR��x2������[�'^����]��TO��W_�����om�8���GkS+�� C`1�r�+��Y�K�th1����{�:��P*�~���H���oS,�6��_DP;?����k���+���o����5�<�Qu����A{�|��o��}>d0"��V�R���QP\�+6U��	����r���UO-��#����q%)����Q~v��O����S{Y�_�����_�����>�d���h�BI��*w��\V�K74�(��g������	�N��GHC�\�?�U��o�g����ec�K�.��o�G�2?�r�����_����0�q"+�4Qk{5fHpu@zz8c�WzjR9l@1��6tk��+�%��������;���!i�Q�4�c�t�^����Wf�������H���mo�Tw��gL��TO��Y�K�th1�?�����jc�K�cv�F;{�@ib[�A���z��������	h��KWou����zlz�yC�ei�[���!�k�b�����)�t5����]I�����!��_y���o��O�mR�8��_c�/S|����4�.��6�����T'�Z7FC�\F�(0V>���>]l�1s��;����S�=fa�����F�����o�5�P�+�~?���T��wQK�������M^R� ���f��lb+
������SJ��L�
p�����|9eJJ1����!*���8G����c����,j��P����Z��5J}:4�:&�j���j��K��V]��+��:����v�I�]i�1%���M�bP����*OD�*�U;A��c�cph�Bm�9�����b�$�*�����[���<����o�~^uG�?������?�n�\54���s���+I1���������u��7��������.�/b����^�p�g��`�8WBoN�W�����|�;o���o�{\�tY{�x���9M'����3�����
r��P���~�b�W���=���]��G�i�-����XT��P_'[�.:;�-�I�$cb
�]D��I����/
���}�������*�n��{�3���b9�w��l���W�g�}��^������>z��,����t��� C�����6��P4Pk}��`(_Q�S�=4����	q��j��|�3��������I������v�7�6A��`��6P��{�U���J����\����W��>�1�8�1��|�`d��L�.�_~,z���o��J}�vS����1M����"#���~����Of���c����Xc��I��w��4�����\�������
}�e'VW��oG_�� b=<��z��q����i]�|c�q(���h��)�}c��:����J����3(�llP�����
-���S�=fe�v�1d�������a���x�5d��y���#�����2�j���Z�l��`��,������xg�K���K���`�����-92?��D?�e������p;�;�b����[��|�n����t��/���tr����K�f���pa���hcW���������"���f1=r��U�����c�6dW���.d�����G1���$�7�f��b���|n���]1zz����[�[��
��<�-k�QG��?I����j���v���1�;�`�����2�?��<�(K@o�/�F��}u���
�����GG2c��_of8����l�}��2�����`�k�a��`�w����x3�������#`�`��=������������
oX����c�P��^��u�����]=����kn�����C-�~�2���������uZ�p���r�|5d�?���~��~o]v�����v����g�����0�����>3���g�{������]�z�mm���|d���S�U�{���V�}�{o���|������W���)��QZ��Bmb��=��|�/_^}�3�Y���g?N��}������7�7Wo}�[WO<��`�~�!�U������#���
05�2��Im�N3��d��f����������Gs%�����+�'?���5���o~s}B�����d�_~���-nE������� f�KZ'3��������
��L��kfz����6��I������6��<��:�-��}���z����]&^���_:������L�G�.������h���oO�RTv�^�^�U�e^?������>�<�������}�{�z�Lg������6����������"f�U���$�y������M�����~��w��f�L����W��u&}�o�<�>�������QJ��S1���������CT2��}G��7���_x���z�/}3�[����+�2~?�����
]��m>��}�{7���s����W�UN}���	�[B$����zj�-���k����Q�j'���dy�C&��%�����]^�[m���m��^3�����-7�Pmp����2[�����c��c�Mo{��n�����/|�����@��+����w��%�����'@j+��t_���'�B}��o}���f���hm��2��_a5�-c��o�p���$��i�h���r�+�2��M�k^���=��th��[�J���[3~��z��*>��7Zf�J��v��NR����M?�������M�6J'�v%�7�2�fx=O?��������/�x�f�2���X>���_��������i|t�����r���tF�7[>z3��+�v5���`j<
o��hDNW��^F�O��On^���>����W��P�T>��C���-Z�������?���[-W_�[�<�����O���W�������%�0)���2��
�_:a~�,������mO�"����P����>u��/�0��4�L���c��n�[w��\]u��XJ�yT>����m���yD����']?������j_}Sc���>���8����^��S����E��7Je�[��l��F��z0��
W��������g�k_����[�&�U���F�d�#C�������d�F���F�n�k������/�����mo��`��*~s��E����ZJ����an�?��`+%S)�f"je������"��0�o��
�w�����GG�������{���G���z�Pd���&}[���V��)>�k�w_���>���2=��;�`�������A�;�7���S����Q��Wy���{���T����^ou_��=����i7�>ri[��������W=<��S1�����9��������v�+5��E�������_������Voa�:Io
*�_��o��o�~����5���y��:��s��v���~�G'$���2����X��f�w��<�������"�x�L���MOt�2�/d��t�x���	��h���\�8���t+�}s�
~������?���X�x��[���=��_��z����y(�����^3j����
Iz������>deX����&_o�u;N|G~�������9�����|��;i��N��Hq�x��S[j��k�����@ofc���Q��.#j��d��_1�)�7��l{16������d�6��'3�B��o$���?��z����'F5��<�����2����D��
����I�/�mf5�DK����������!�Z3�2L���]WW~u����_>\s+�����2`2�v5�$�T�?�p�[Q����<�N�D������m�$��x��d�2�B��v�+��Tv�Q,�������M�/��������N�{�������?��?�r�j�2d��Y�[ct������������
v4��u���kf���k��y�������2\�z.��t����7�y��1o������?�����u������>��s��nG'02�z���-������T���T��S;�P�uk���}�}}���ox���H�m+S��{���uS>����>�^�M��2�Bq��u�����Q����~�����_�R�Gav���`����=���f���pa���X6������������V�$�z�?����0�{�`����=��`���?����0�{�`����=��`����s�V'N�X��������;v�N�<��x��a��>}��4�<�M�����d�e�e�3����[�)�h�Mc�&������5�d�O�:����{V�.]:\r;v��O�e�l���*�L��)3�2�Z/�?�}+�=���M�����=��M��������w��Nd��x"P�~���M�����%J�[�������G'�m9f���1�d[�0�[�d��vo���-��p0�����(��M��r�?���%Z�4��~��k}�M���?~��c�!�B��%����(���x��o�����r�&�z�~�i�B!t�����(3��A^�����@mY�&��~���%�������+�,�}+ ,?"!�l��K���!�?��G�����m��`���?y�����l��{����1��{���>;H���-��'������V�����:������[~������������&fm_��q`�:�w�^NM*��t��`�:���^IM*�#�����ZA�?@0��
�`�:���VP������:�`�����t`*���/�������������_I�x���^Y�|e�z����7���i��>xi�[���R�W�6����/]_i����j���������;�������z���\��|����g��������}em�����������W�6�p��O����ON�����t�����ts]��{WWW��
�����!�/��K�W��t���=����6o;sgnq�:��6��?���w>����w�r�����f��������~�h����^\���������;8K8�f�������������b�����������������c/�~4`�?����������u~�f��y�����|����A��������W�� d�l[P���9�w}��y����%�������g����mG���1�����VO=w������������z��7���w�y��`�:0�oo����������I�	�N�y��W�?��z���������}/�.��������]�,�6u0�����U~��-<�,3�z(�����-A�5H�Z��d'�Gp��c��/�!����������}�zX�����n����^[_��C��nS����\[�g�/�!�����������I�_}����C��	�vp"����~���_�.W����s0���3��r�t��+Wn�uG����_Z����=�����������������>��+���o}���	x������}��:����L��{�������>u�`�tB�7�\=8O��=��>�_�������t`W����|������|��v����2�z�����o	��=u0����#��0���O+�����ZA�?@0��
�`�:���VP���[~e��&zS�>��<xi��'���r��O+���h@�?3h���z�z)�r��O+�s��������U��?��������L0��
�1_�}>�`��p�>�q^2�i����>�[}��#�?j����0@f8Pa�Q������p�>���A�?@��@}��G-�:�2��C����V_�����g����{^�m�o������\[]>|���k��g-�i?�����/]_�������/����uS��'_Y�|�F�����j��6��G-�:�2��-��7��z��^Y�����]��������������r`����WWg�qe���nl��_�����.�������O�������_����K�W��_�%��w�����������^�E���t�m
��Zu0�
d�c[z����6�/_^�����2����^]]?0����W��N�q��r�}-�o��z��k��L��?�����#��Z����������>{#����G-�:�2��-��/����u��	�����k�Nt���������N����$���	���e��
�n��-?������A�?@������/IF]W�u��>��w^Z�����W����y���+��.�V��0l�����x�������[�5���m��Zu0�
d���Z���}��=H��r]���������?>pi����/^[_���O
S�_f�����o/����K'
o;���b�Q������p��X�/C/S��}���[�����_z�����L�����2)����K�<l��_������g��-��0��EP��@f8zi���)����/S�k�n���/��>��+������/__��?�S�2��}ZnA:�0��EP��@f8z�f����j����+�����7�<�����{[���}y������x�C�c�}�V!�����:�2��K5������/�M�d���^����7���_Y�����_��� �Ee�����O+���h 3�4d���~������j�������^����2���������-x�n�L|�I~���ZA�?@������������:0������/�z�Wu�f��s?��f��������L�^���Z����
�(��h���m��Zu0�
d���J�_���o���z�������O�g�n��od�ue_�����z�����k;�,�wx��80��
�`���#�?j���4��G�"���h 3��0��EP��@f8Pa�Q������p�>���A�?@��@}��G-�:�2�����Zu0�
d�����`���#�?j���4��G�"���h 3��0��EP��@f8Pa�Q������p�>���A�����.��0����G�"�|����2�����Z����>�[}��#�?j����0@f8Pa�Q������p�>���A���8w������3N�>�:v��M�s��t�l�k|��������t`@�������[^��pT�c����`�����W'O�\?~<5�������t�l�k<�����h�������G/�'i�}}������
�c���g���Y�PA��������dg��N
$�����M��Ed��m_���t��'�����}����OV?�G�"������~�=���)3�g��]�����|l��M��Eh��������+W����>���A��<��C7��N2�_ZnW��V���26�va��VK'�����8#��#�?j���o���>u�Tz���r�c�el��.�@;��NV'�G�*�?j���o�]0����_=��c�����t@@}������	��������n���3�G�*��%I>�7�-�������	��c��������N��tlf}�4euC}���a>�VY,M���o�����!S��bl��M��E��j���_\��K�W?����G���y��^Y]������rI�_�������:�_��~�������z��'n���/\_���\Y�Z:Y�Pq�j���A���������7��M�����H�����~i���\Y���
s.2������+7����]]}�[WV�80�W��Vx�l�?��+k_�������=c�|�����e���4��duB}��G����A���(}�r�n�e�l���h����kc�+��|�������������O�Z��|����_��z����x����|�G_^=��W'�f��}���+��.�V?����������������~�C/�������^[����t�z}����Oq����O��O4}:}�����K����k�(O��O�����y;�����\���
��Zu0�[�d�����{T�gL�K�.�7@����}����~em����?95����_Z���H=spr����������������[������[������j��G?����>��a3�7����NV�mI}����O2�"��{���3�Vg�qe}R'���������!������t?����M��':a,��d��W.��G}��7N6����t�m
��Z��G-�:�-!�=tu=��!��+1v���m����C/�o�������t�����o5i����A^��:?Q3�2����
�Nt2��s��
������-����/_^�
������N&�-�?��[}��[�����m�����$�&������n�����
��Zu0��s����������l�mm����~u`����5�����������-!�]�O^]�����1du�������s�Y����|���j���go���?��WV�N�����x�I�_^�k����������g���
��Zu0��s,���UWy}k���=~e}U����=���pO-��N�T2��������M���Q���Q��W����7bg��D�?j���`�a�X����q�q��M�>����������7�|��WV?������kC�������J�?;�����'�t�u�C��g��z@]��Y�m��Z��G-�:�9�`������+����}��_5mc��7�~{�X��BO?wm}�n���������K�4�Zf�MQ|�-����������l}a�Q�0��EP�;�����G���\W��[}�������N|�R��
=�t�:��6��~�Ao������sS���]��	�����:I�����
��Zufk�O�:u��k���V��J���Kz=b������V��n�����q5W�����V{{O�UCW�����/__��G��������J��{�-����l}�bR�Y�w����Va�Q���,������������������;���f����>��-�C�z=��Y����=�z}��.��N�T2�:�S��_�<��G����v[�+?��a����o��B�dizi�&�����h�13�|�0�c������������ 2�@;d��?���/e�%��1�����k;��EW�m�L����|o������������������J�_\F���_���o�M���\��~e�[c��g��J�*�?j�����x������7������{��uJ���i�t�:�R��������"�n���Yu��~�W���O��sW�'~�&�^�_��=���?4.��'88���R�*�?j�����������#l=W����Z:Y�z�d�%�z}�#/�#l�%G?����y�G�wu}�^����_^_�did���G|����2���T������Va�Q��������Jvu����=�P��vZ-��N��0��U�"�3K�o��������n������j�duB}��G����A�Y�M`��VK'��#�?j���`�a�`��VK'��#�?j����f���pa�5���v���1��>P��vZ-��N��0��U�"�|���X<��j�duB}��G�����0�������>C�j�� �@;��NV'�G�*�?j�����{����?C����C	�i�t�:�>���Va�Q��������3��N�p��`��VK'��#�?j���������/���u\����j�duB}��G����A��a����y����`��VK'��#�?j��������'���{���Z:Y�Pa�Q�0��EPg���(0�N����	���
��Zu0��s0�N����	���
��Zufk���|�x��`��VK'��#�?j�������L������j�duB}��G����A��{�?��
��Z:Y�Pa�Q�0��EPg�����`��VK'��#�?j�������x��N8*��j�duB}��G����A��a?�u����%�a��VK'��#�?j��������>��^<%h���������Z��G-�:�9h���������Z��G-�:�3���@;��NV'�G�*�?j������@;��NV'�G�*�?j��������[|�L�J0�N����	���
��Zufi���>����Ni2h���������Z��G-�:�4�z��~�#^Z�u��J0�N����	���
��Zufg���r���V(�@;��NV'�G�*�?j������@;��NV'�G�*�?j��������@;��NV'�G�*�?j������^��i�t�:�>���Va�Q���,��(���+�P��vZ-��N��0��U�"�3[�pTh���������Z��G-�:�5�g����^��10�N����	���
��Zufi�e��{���_Ii2h���������Z��G-�:�3�vo�^{�Wi� �@;��NV'�G�*�?j����?y���kj��{�aSh���������Z��G-�a��{�?�?�����C	�i�t�:�>���Va�Q�`�������������@;��NV'�G�*�?j���~�������Q\��!h���������Z��G-�:�4�����+�P��vZ-��N��0��U�"�3k�ph���������Z��G-�:�9h���������Z��G-�:�6��V=�������������Z:Y�Pa�Q�0��EPg���~����o�����{���Z:Y�Pa�Q�0��EPg����������W{���P��vZ-��N��0��U�"�3;��~�,�7�����j�duB}��G����A�E}#�����j�duB}��G����A��1t����:��%h���������Z��G-�:�4�f�e�3���%h���������Z��G-�:�4�����+�P��vZ-��N��0��U�"�3k�ph���������Z��G-�:�9h���������Z��G-�:�5�����������>0��Z:Y�Pa�Q�0��EPg�����c�_o�1�o����0�N����	���
��Zufg�e�e����g{�O\a��VK'��#�?j�������������{��u���`��VK'��#�?j�������}�v��n���+�P��vZ-��N��0��U�"�3;�/����]����s�?�`��VK'��#�?j�������}�2��Wz�����~+��������SK]7�v�`��VK'��#�?j��������[��h��q7�1��l�k0�N����	���
��Zu0�� ��]�'���M��Eh���������Z��G-�:�0�fRe�������R���T��1�V�b6J�=�l��0�N����	���
��Zu0�������)����<�a���������Z:Y�Pa�Q�0��EPg��)=�+��t"����y��`�mw�i�t�:�>���Va�Q����3v��7���Ev����O?�z���M�GY,IY�P����`i����h�1�|4��>X�z3�o�_
�)3�K��	0S>�����c�=�x���?J�GY,IY�P����`i����h�1�|4��>X���z���CDS�����{���n������Z:Y�Pq�j���A�����.���0��L�j�]�7S^zOi�g�mw�i�t�:�>���Va�Q�`����c����o� �]��o������M��Eh���������Z��G-�a��{���>[�?������,��U���'b�����Z:Y�Pa�Q�0��EP'�z�'d�}�K�<���d�Em�}��vZ-��N��0��U�"�3��Mh���������Z��G-�:�9h���������Z��G-�:�0�v��n�{���n���`��VK'��#�?j���`�a�`��VK'��#�?j������l�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���,�������J0�N����	���
��Zufg��~���%h���������Z��G-�:�3�v�?�eZ�10�N����	���
��Zuf{������K�`��VK'��#�?j��������g���?�:w�����0�N����	���
��Zufi��~�+����=�P��vZ-��N��0��U�"�3;�����)��j�duB}��G����A����Ma��VK'��#�?j������?��Qa��VK'��#�?j������=�{��	��#�@;��NV'�G�*�?j�����[}������J0�N����	���
��Zu0��s0�N����	���
��Zufg�6��vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�������vZ-��N��0��U�"�s����������Z:Y�Pa�Q�0��E0��z��1��xh���������Z��G-�a��{���>�xh���������Z��G-�:��c�a�0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zu0��s0�N����	���
��Zun3�O^�������`��VK'��#�?j���1_�}>���Z:Y�Pa�Q�0��E0��z������Z:Y�Pa�Q�0��EP'�z�?,�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���v�i�t�:�>���Va�Q���f�>}zu�������l��.�@;��NV'�G�*�?j�����h�[�&����j�duB}��G����A��L�x������k�����M��Eh���������Z��G-�:��p����z��)-�l��.�@;��NV'�G�*�?j������s�����s��������������Z:Y�Pa�Q�0��EP�?N�:���c�]�Kl��.�@;��NV'�G�*�?j�������`��VK'��#�?j���`�g����O�i�o���A�.�����']�PI��'����?t��y����m=�k�B!tg�����b�7�l�-����7���<��<���M����?#tK���G�[u.]�����{�����m`����h��q/Q���?����0�{�`�����[�N�:\RFi�_O����4m��p��ZX��jo���ww��;��q,k7[��:f��r��G�*���g�2�v��\S�fo�[:������1�d0�l��]���MP����1��a�4d�}��jo�����o�n�Z5w��2r�o����C'�sm_��X0�/��}���;�g��������(`�]w�U@�LWoO�8�m�Q��{�&���d��T_��R��^�z����zB�X�a4�I��M���U������}��������;l�(�]�UL����w'��}��,^j�������
���
 ��w�z�i-�`w����6�DCT���&��3���kb���D���F�m���������3����4���c[����;�s��l������e���Z�<mr�e�h5R�I���}�g����e���������_���W�*�������Y_f�����}�g�C���y�����`�m���K��������������m(�v�����������U�_i�Y�k(}����#�M(�O+����"�c��3�68� (E����AF���
�^>��,�
�q�mk�_g���r���~��>R���\4��IY����lB6Y��>Q��z��~RJ�e*������3k�������_8�[��N���n�G�����Q���F�l���wVW/�����:��-^�11`���2��*�m������x������]�_�SlY�}������3����v���/�V�(�~�7�������jS����i�e�P:���V1o��_�x��i�/o�����gn3�O^��������'~
�q�z3�q�Qz
�q�T:?�e����i�Ni|^�hc�������X[_8��nY�Hj;�wa�T�'K�m�g�����Z����c���������56�r[9K���/|���^��_�[��s?�-]�G�l�C�b#��a�P���Y�D�����m�1Fu�e��O�.���,��C�N��-N
�������!b���(��YC�,�����x�6>������z��1���������@Wd�V���,b��
������>����o3��M��_�����Sy	��/�P�V1���V��ib��V�L>��^��.���^qV���������K;�U^���-���i��B��4V�R����=��qV���eVnIu���n���(Z]J��������?��������|Z��m�X�LCe��}�%�k��������b���|n�8�L6�k0���l���l0�l.
�~�����;n�e��m��>��H��vW�}�u��T�'�O�Y��IT��X���6^����b(�$_�+��bm�Y�
aq\+������j��}����2������QZ/����}���6^V���C(���W-�b���m����Aik��SX�H�[��cB�m]�6�,��Q��c���K������'����X���5f�Z��T�|�,-� �z�?�!q"�����?�fp����7�~m@��6���^6x���y��}�T���b�[Zk[�;['�lB��
��?M��7:����)��V?�Kir��v�������M��p�[l�����C��Z��>��cPy��S�o����u���_,�������F����W(������^�OG�#�s�?c�dh1�|��������
(6h��q��Ri�r
DC��������^��#���F�G���P{[���=�Y������/G)�m������}xb���>bm����Lm7&���>�����;���^12l?��&���kd��������W9K�>b�R	���<'����Y��N������z��
����o�����'c�����1}���<�N%������(�%�r
>~0��>��V�m �@l���,���(M����m���
|[�d_���>�6�vU[�d���g����_&��������2�����_�G���U�^����\��Y��o��o�����6S�j���%�]���~1�\�������:���_��ookK_gk���Z��X����~���V6�|�6�6��c�m6f��^C���70���/l���zdl /)��,�
�Zf��3n���mk�>����o���Cm-�d7����!��11�b����d15�F���b�:{���Y}���z�[���Xl�Y_�����a����R�%�������!���e�k_W/k��t���-�}��A)�#����6Fd�rY9Jh��2Z�����o���
2q�U�^6	����48*�m,�&�������o�����������/��-����U2	�8�LY|�����GF�n��G1	S����-����c������Q����n�}����lY���_�����}���a�����%�j}�+���U���-i�
�?����0�{�`����N����Z���Qz�J|+L�����[4����|��w��
j�����}`o0�t�-<��t��=����m�gL:+����c����
8:���D�M4<F4��h���3�u}2g2i�/�Zx�Y<9���b��M���*�{���36��`L'��@�:F�����2fn��1c��P�����g�e�7f�W�#Z�}�U��>0s�cb�e���uR���vy�.�����d�
����22+��iYfx�r�h�D\>6��>�LFJc����Y��V���������v����*��#=��C����m�gl:�%�~c��80�[��TF����1g&�t�`&����tB�����3T������O�j����}��J������M'�V�j����vyZ�m��-tr$��}���-b&�����������I�(3Bf��^�M�~�?�7c���+%:��7��16]����]����8P��,&��8�-"�M���m�}��(��}0��M�1Tc���N���*��W��P���?�������PY�����c���t����6��Kyo�<c�����������l��Q�2�f��}f��Y����M)�����N�������c�El����F��R��.�Q��Q��-k�
m`��H���].�����|l�H�H�����1k�Bk[��v" s�������/����1l�<c���k(�j�
m`���LMfT���������yl�,R3R%��X�F���w��������&�"�������3v���o�X��<Z����?�2afp�����Y����t�����j"������}`����$c�y��g�.��t��������Cu�60�[��J��y4���� �L��t������e�<C���[�����h�K��MgX�Z�G�]������Z:;�j����l���� �[:i�g8!��9��l�?��Y��J��'N`ntEZW�9)�ko��l�?@t��t�\�j�0:!�����=\����2�K�b����������~��.�=���f���pa���X6������������V�$�z�?����0�{�`����=��`���?����0�{�`����=��`���?����0�{�`����=��`��/^\�<yru����%����K�{��g-�
�+`�������n`���������gW��[K���0��t�O�^�9~�����s��~[�����9s���P��������.\��^7��R��@7���8q��-ycof���g-��Lf�m�,��#;����[^w�}�Z1��`I`��f�e���r3�v�7�>�-�W��]���������e'��O������,�����.��0�pT��{m�@�����G���7���-���'c�X��������0����Zf�����N���^���}\ny
��36/�%a���|n�����������a�����}\���}&�z�?l���1f��NiJ��d�m��/m��vcn����40��
��_���Zf�d�3��������r���=�D��g��e#��40��
3���{��X�p��j�(�j[��������r�,/I�Z�X��]�����~1d��~I'�B�o������v�?�)Z�X
�=��`���?����0�{�`����=��`���?����0�{�`����=��`���?����0�{�`����=�V_�Z���1�����MIEND�B`�

v15-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v15-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 5e9d5a0efc45a252c8201c29ca6858bd25137d82 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v15 2/5] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..ed33d819e7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3086,7 +3086,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a25b674a19..b92c948588 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1969,7 +1969,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 8c447cf0a2..8de302ddd3 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 20adb77ccc..3f9344b026 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3717,7 +3717,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3742,7 +3743,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3758,7 +3760,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4807,7 +4809,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..d5c66780ac 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1713,6 +1713,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 52314d3aa1..2306602a51 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v15-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v15-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From 55f122f7a21d8cab1503622bf47d580c4749b037 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v15 3/5] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v15-0004-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v15-0004-Add-Result-Cache-executor-node.patchDownload

From e379cd018d91c4191419ff0403776cd4521c7137 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v15 4/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  283 +++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2822 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..613c46f886 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..4a544a3ab5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a218d78bef..9794943f1c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4903,6 +4904,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..e42983da02 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1993,6 +1998,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3066,6 +3075,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 74ac59faa1..c6bffaf199 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 4543ac79ed..18cbfdaeac 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4ff8000003
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 493a856745..bd6e4464d4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -947,6 +947,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -5006,6 +5033,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 73dd2255af..7c30a09ba5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -845,6 +845,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1918,6 +1933,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3879,6 +3909,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4113,6 +4146,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c5e136e9c3..cee654cbc0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2207,6 +2207,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2895,6 +2915,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5ac993042e..d2de2d7c65 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4031,6 +4031,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b92c948588..9dfd0fb4ff 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2401,6 +2403,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4141,6 +4284,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..3cb5203a3a 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,249 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * find_resultcache_hashop
+ *		Find theh hash equals operator for typeoid.
+ *
+ * 'rinfo' must be the RestrictInfo for the qual that we're looking up the
+ * hash equals operator for.
+ *
+ * The given rinfo may have been previously determined to be hash-joinable. In
+ * this case we can simply return the hashjoinoperator.  If the rinfo was not
+ * determined to be hash-joinable, these may still be valid for result cache.
+ * We just need to look and see if there's a valid hash operator for the given
+ * type.
+ */
+static inline Oid
+find_resultcache_hashop(RestrictInfo *rinfo, Oid typeoid)
+{
+	TypeCacheEntry *typentry;
+
+	/*
+	 * Since equality joins are common, it seems worth seeing if this is
+	 * already set to what we need.
+	 */
+	if (OidIsValid(rinfo->hashjoinoperator))
+		return rinfo->hashjoinoperator;
+
+	/* Reject the qual if there are volatile functions */
+	if (rinfo->has_volatile)
+		return InvalidOid;
+
+	/* Perform a manual lookup */
+	typentry = lookup_type_cache(typeoid, TYPECACHE_HASH_PROC |
+										  TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return InvalidOid;
+
+	return typentry->eq_opr;
+}
+
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Oid			hasheqop;
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* We only support OpExprs with 2 args */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			/* see if there's a valid hash equals operator for this type */
+			hasheqop = find_resultcache_hashop(rinfo, exprType(expr));
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(hasheqop))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, hasheqop);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+		TypeCacheEntry *typentry;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			bms_free(var_relids);
+			return false;
+		}
+		bms_free(var_relids);
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (innerrel->reltarget->has_volatile_expr)
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->has_volatile)
+			return false;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1725,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1734,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1904,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1929,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..5d0e908d05 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -276,6 +279,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -451,6 +459,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1524,6 +1537,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6442,6 +6505,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7028,6 +7113,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7073,6 +7159,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..9c166f621d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -751,6 +751,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..1ad44e6ead 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2754,6 +2754,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d5c66780ac..3f654e1155 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3876,6 +3926,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4094,6 +4155,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..e1425270df 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f46c2dd7a8..1f54e1c2f4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..a71b0e5242 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1999,6 +2000,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e22df890ef..f0b3cc54f0 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4526ae4297..5182a52415 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1478,6 +1478,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 95292d7573..678f53a807 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -775,6 +775,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1be93be098..67f925e793 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..fe8a2dbd39 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 2c818d9253..dcdb7526a4 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6d048e309c..a243b862d0 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -110,10 +110,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e280198b17..585814ad9e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,7 +115,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6a57e889a1..577e173d32 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -201,6 +201,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index f9579af19a..287acbf694 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v15-0005-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v15-0005-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From 7c9417b7632420dc6ec63c7f0b2cc676e3034778 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v15 5/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 4ff8000003..4d6cd9ecfe 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

#94

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: David Rowley (#93)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

David Rowley <dgrowleyml@gmail.com> writes:

On Tue, 23 Feb 2021 at 18:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I doubt it's that bad. We could cache such info in RestrictInfo
for quals, or PathTarget for tlists, without much new notational
overhead. That doesn't cover everything the planner deals with
of course, but it would cover enough that you'd be chasing pretty
small returns to worry about more.

This seems like a pretty good idea. So I coded it up.

The 0001 patch adds a has_volatile bool field to RestrictInfo and sets
it when building the RestrictInfo.

I'm -1 on doing it exactly that way, because you're expending
the cost of those lookups without certainty that you need the answer.
I had in mind something more like the way that we cache selectivity
estimates in RestrictInfo, in which the value is cached when first
demanded and then re-used on subsequent checks --- see in
clause_selectivity_ext, around line 750. You do need a way for the
field to have a "not known yet" value, but that's not hard. Moreover,
this sort of approach can be less invasive than what you did here,
because the caching behavior can be hidden inside
contain_volatile_functions, rather than having all the call sites
know about it explicitly.

regards, tom lane

#95

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Tom Lane (#94)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 12 Mar 2021 at 14:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The 0001 patch adds a has_volatile bool field to RestrictInfo and sets
it when building the RestrictInfo.

I'm -1 on doing it exactly that way, because you're expending
the cost of those lookups without certainty that you need the answer.
I had in mind something more like the way that we cache selectivity
estimates in RestrictInfo, in which the value is cached when first
demanded and then re-used on subsequent checks --- see in
clause_selectivity_ext, around line 750. You do need a way for the
field to have a "not known yet" value, but that's not hard. Moreover,
this sort of approach can be less invasive than what you did here,
because the caching behavior can be hidden inside
contain_volatile_functions, rather than having all the call sites
know about it explicitly.

I was aware that the selectivity code did things that way. However, I
didn't copy it as we have functions like match_opclause_to_indexcol()
and match_saopclause_to_indexcol() which calls
contain_volatile_functions() on just a single operand of an OpExpr.
We'd have no chance to cache the volatility property on the first
lookup since we'd not have the RestrictInfo to set it in. I didn't
think that was great, so it led me down the path of setting it always
rather than on the first volatility lookup.

I had in mind that most RestrictInfos would get tested between
checking for hash and merge joinability and index compatibility.
However, I think baserestrictinfos that reference non-indexed columns
won't get checked, so the way I've done it will be a bit wasteful like
you mention.

David

#96

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Tom Lane (#94)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, 12 Mar 2021 at 14:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Tue, 23 Feb 2021 at 18:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I doubt it's that bad. We could cache such info in RestrictInfo
for quals, or PathTarget for tlists, without much new notational
overhead. That doesn't cover everything the planner deals with
of course, but it would cover enough that you'd be chasing pretty
small returns to worry about more.

This seems like a pretty good idea. So I coded it up.

The 0001 patch adds a has_volatile bool field to RestrictInfo and sets
it when building the RestrictInfo.

I'm -1 on doing it exactly that way, because you're expending
the cost of those lookups without certainty that you need the answer.
I had in mind something more like the way that we cache selectivity
estimates in RestrictInfo, in which the value is cached when first
demanded and then re-used on subsequent checks --- see in
clause_selectivity_ext, around line 750. You do need a way for the
field to have a "not known yet" value, but that's not hard. Moreover,
this sort of approach can be less invasive than what you did here,
because the caching behavior can be hidden inside
contain_volatile_functions, rather than having all the call sites
know about it explicitly.

I coded up something more along the lines of what I think you had in
mind for the 0001 patch.

Updated patches attached.

David

Attachments:

v16-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchtext/plain; charset=US-ASCII; name=v16-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchDownload

From 9fdcb7aba6078788bcd23670f955c0dd8e60f493 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Wed, 10 Mar 2021 22:57:33 +1300
Subject: [PATCH v16 1/5] Cache PathTarget and RestrictInfo's volatility

This aims to can reduce the number of times we make calls to
contain_volatile_functions().  This really does not save us much with the
existing set of calls to contain_volatile_functions(), however, it will
save a significant number of calls in an upcoming patch which must check
this during the join search.
---
 src/backend/nodes/copyfuncs.c             |  1 +
 src/backend/nodes/outfuncs.c              |  2 +
 src/backend/optimizer/path/allpaths.c     | 40 ++++++++++---------
 src/backend/optimizer/plan/initsplan.c    |  2 +-
 src/backend/optimizer/plan/planner.c      |  2 +-
 src/backend/optimizer/util/clauses.c      | 47 +++++++++++++++++++++++
 src/backend/optimizer/util/restrictinfo.c |  2 +
 src/backend/optimizer/util/tlist.c        | 10 +++++
 src/include/nodes/pathnodes.h             | 16 +++++++-
 9 files changed, 101 insertions(+), 21 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index da91cbd2b1..493a856745 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2310,6 +2310,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_SCALAR_FIELD(can_join);
 	COPY_SCALAR_FIELD(pseudoconstant);
 	COPY_SCALAR_FIELD(leakproof);
+	COPY_SCALAR_FIELD(has_volatile);
 	COPY_SCALAR_FIELD(security_level);
 	COPY_BITMAPSET_FIELD(clause_relids);
 	COPY_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6493a03ff8..afd281ab5a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2473,6 +2473,7 @@ _outPathTarget(StringInfo str, const PathTarget *node)
 	WRITE_FLOAT_FIELD(cost.startup, "%.2f");
 	WRITE_FLOAT_FIELD(cost.per_tuple, "%.2f");
 	WRITE_INT_FIELD(width);
+	WRITE_ENUM_FIELD(has_volatile_expr, VolatileFunctions);
 }
 
 static void
@@ -2497,6 +2498,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_BOOL_FIELD(can_join);
 	WRITE_BOOL_FIELD(pseudoconstant);
 	WRITE_BOOL_FIELD(leakproof);
+	WRITE_ENUM_FIELD(has_volatile, VolatileFunctions);
 	WRITE_UINT_FIELD(security_level);
 	WRITE_BITMAPSET_FIELD(clause_relids);
 	WRITE_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e2510235ef 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -134,7 +134,8 @@ static void check_output_expressions(Query *subquery,
 static void compare_tlist_datatypes(List *tlist, List *colTypes,
 									pushdown_safety_info *safetyInfo);
 static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
-static bool qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+static bool qual_is_pushdown_safe(Query *subquery, Index rti,
+								  RestrictInfo *rinfo,
 								  pushdown_safety_info *safetyInfo);
 static void subquery_push_qual(Query *subquery,
 							   RangeTblEntry *rte, Index rti, Node *qual);
@@ -2177,11 +2178,12 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		foreach(l, rel->baserestrictinfo)
 		{
 			RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
-			Node	   *clause = (Node *) rinfo->clause;
 
 			if (!rinfo->pseudoconstant &&
-				qual_is_pushdown_safe(subquery, rti, clause, &safetyInfo))
+				qual_is_pushdown_safe(subquery, rti, rinfo, &safetyInfo))
 			{
+				Node	   *clause = (Node *)rinfo->clause;
+
 				/* Push it down */
 				subquery_push_qual(subquery, rte, rti, clause);
 			}
@@ -3390,37 +3392,39 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
 }
 
 /*
- * qual_is_pushdown_safe - is a particular qual safe to push down?
+ * qual_is_pushdown_safe - is a particular rinfo safe to push down?
  *
- * qual is a restriction clause applying to the given subquery (whose RTE
+ * rinfo is a restriction clause applying to the given subquery (whose RTE
  * has index rti in the parent query).
  *
  * Conditions checked here:
  *
- * 1. The qual must not contain any SubPlans (mainly because I'm not sure
- * it will work correctly: SubLinks will already have been transformed into
- * SubPlans in the qual, but not in the subquery).  Note that SubLinks that
- * transform to initplans are safe, and will be accepted here because what
- * we'll see in the qual is just a Param referencing the initplan output.
+ * 1. rinfo's clause must not contain any SubPlans (mainly because it's
+ * unclear that it will work correctly: SubLinks will already have been
+ * transformed into SubPlans in the qual, but not in the subquery).  Note that
+ * SubLinks that transform to initplans are safe, and will be accepted here
+ * because what we'll see in the qual is just a Param referencing the initplan
+ * output.
  *
- * 2. If unsafeVolatile is set, the qual must not contain any volatile
+ * 2. If unsafeVolatile is set, rinfo's clause must not contain any volatile
  * functions.
  *
- * 3. If unsafeLeaky is set, the qual must not contain any leaky functions
- * that are passed Var nodes, and therefore might reveal values from the
- * subquery as side effects.
+ * 3. If unsafeLeaky is set, rinfo's clause must not contain any leaky
+ * functions that are passed Var nodes, and therefore might reveal values from
+ * the subquery as side effects.
  *
- * 4. The qual must not refer to the whole-row output of the subquery
+ * 4. rinfo's clause must not refer to the whole-row output of the subquery
  * (since there is no easy way to name that within the subquery itself).
  *
- * 5. The qual must not refer to any subquery output columns that were
+ * 5. rinfo's clause must not refer to any subquery output columns that were
  * found to be unsafe to reference by subquery_is_pushdown_safe().
  */
 static bool
-qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo,
 					  pushdown_safety_info *safetyInfo)
 {
 	bool		safe = true;
+	Node	   *qual = (Node *) rinfo->clause;
 	List	   *vars;
 	ListCell   *vl;
 
@@ -3430,7 +3434,7 @@ qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
 
 	/* Refuse volatile quals if we found they'd be unsafe (point 2) */
 	if (safetyInfo->unsafeVolatile &&
-		contain_volatile_functions(qual))
+		contain_volatile_functions((Node *) rinfo))
 		return false;
 
 	/* Refuse leaky quals if told to (point 3) */
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..efca702891 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2653,7 +2653,7 @@ check_mergejoinable(RestrictInfo *restrictinfo)
 	if (list_length(((OpExpr *) clause)->args) != 2)
 		return;
 
-	opno = ((OpExpr *) clause)->opno;
+	opno = ((OpExpr *)clause)->opno;
 	leftarg = linitial(((OpExpr *) clause)->args);
 
 	if (op_mergejoinable(opno, exprType(leftarg)) &&
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 424d25cbd5..40476cc18d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5903,7 +5903,7 @@ make_sort_input_target(PlannerInfo *root,
 				col_is_srf[i] = true;
 				have_srf = true;
 			}
-			else if (contain_volatile_functions((Node *) expr))
+			else if (contain_volatile_functions((Node *)expr))
 			{
 				/* Unconditionally postpone */
 				postpone_col[i] = true;
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 7e25f94293..bda6e58b5d 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -487,6 +487,53 @@ contain_volatile_functions_walker(Node *node, void *context)
 		return true;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		if (rinfo->has_volatile == VOLATILITY_NOVOLATILE)
+			return false;
+		else if (rinfo->has_volatile == VOLATILITY_VOLATILE)
+			return true;
+		else
+		{
+			bool hasvolatile;
+
+			hasvolatile = contain_volatile_functions_walker((Node *) rinfo->clause,
+															context);
+			if (hasvolatile)
+				rinfo->has_volatile = VOLATILITY_VOLATILE;
+			else
+				rinfo->has_volatile = VOLATILITY_NOVOLATILE;
+
+			return hasvolatile;
+		}
+	}
+
+	if (IsA(node, PathTarget))
+	{
+		PathTarget *target = (PathTarget *) node;
+
+		if (target->has_volatile_expr == VOLATILITY_NOVOLATILE)
+			return false;
+		else if (target->has_volatile_expr == VOLATILITY_VOLATILE)
+			return true;
+		else
+		{
+			bool hasvolatile;
+
+			hasvolatile = contain_volatile_functions_walker((Node *) target->exprs,
+															context);
+
+			if (hasvolatile)
+				target->has_volatile_expr = VOLATILITY_VOLATILE;
+			else
+				target->has_volatile_expr = VOLATILITY_NOVOLATILE;
+
+			return hasvolatile;
+		}
+	}
+
 	/*
 	 * See notes in contain_mutable_functions_walker about why we treat
 	 * MinMaxExpr, XmlExpr, and CoerceToDomain as immutable, while
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index eb113d94c1..e247b41c20 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -137,6 +137,8 @@ make_restrictinfo_internal(PlannerInfo *root,
 	else
 		restrictinfo->leakproof = false;	/* really, "don't know" */
 
+	restrictinfo->has_volatile = VOLATILITY_UNKNOWN;
+
 	/*
 	 * If it's a binary opclause, set up left/right relids info. In any case
 	 * set up the total clause relids info.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 89853a0630..7779aab44b 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -623,6 +623,9 @@ make_pathtarget_from_tlist(List *tlist)
 		i++;
 	}
 
+	/* cache whether the tlist has any volatile functions */
+	target->has_volatile_expr = VOLATILITY_UNKNOWN;
+
 	return target;
 }
 
@@ -724,6 +727,13 @@ add_column_to_pathtarget(PathTarget *target, Expr *expr, Index sortgroupref)
 		target->sortgrouprefs = (Index *) palloc0(nexprs * sizeof(Index));
 		target->sortgrouprefs[nexprs - 1] = sortgroupref;
 	}
+
+	/*
+	 * Set has_volatile_expr to UNKNOWN incase the new expr contains a
+	 * volatile function.
+	 */
+	if (target->has_volatile_expr == VOLATILITY_NOVOLATILE)
+		target->has_volatile_expr = VOLATILITY_UNKNOWN;
 }
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 86405a274e..84e2fe186d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1056,6 +1056,16 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+/*
+ * VolatileFunctions -- allows nodes to cache their contain_volatile_functions
+ * properties. VOLATILITY_UNKNOWN means not yet determined.
+ */
+typedef enum VolatileFunctions
+{
+	VOLATILITY_UNKNOWN = 0,
+	VOLATILITY_VOLATILE,
+	VOLATILITY_NOVOLATILE
+} VolatileFunctions;
 
 /*
  * PathTarget
@@ -1087,6 +1097,8 @@ typedef struct PathTarget
 	Index	   *sortgrouprefs;	/* corresponding sort/group refnos, or 0 */
 	QualCost	cost;			/* cost of evaluating the expressions */
 	int			width;			/* estimated avg width of result tuples */
+	VolatileFunctions	has_volatile_expr;	/* indicates if exprs contain any
+											 * volatile functions. */
 } PathTarget;
 
 /* Convenience macro to get a sort/group refno from a PathTarget */
@@ -1860,7 +1872,6 @@ typedef struct LimitPath
 	LimitOption limitOption;	/* FETCH FIRST with ties or exact number */
 } LimitPath;
 
-
 /*
  * Restriction clause info.
  *
@@ -2017,6 +2028,9 @@ typedef struct RestrictInfo
 
 	bool		leakproof;		/* true if known to contain no leaked Vars */
 
+	VolatileFunctions	has_volatile;	/* to indicate if clause contains any
+										 * volatile functions. */
+
 	Index		security_level; /* see comment above */
 
 	/* The set of relids (varnos) actually referenced in the clause: */
-- 
2.27.0

v16-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v16-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 4c62f8ab62c23d91d6425f09a872b0d4f9e50d28 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v16 2/5] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..ed33d819e7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3086,7 +3086,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a25b674a19..b92c948588 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1969,7 +1969,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 40476cc18d..ae79732999 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3717,7 +3717,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3742,7 +3743,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3758,7 +3760,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4807,7 +4809,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..d5c66780ac 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1713,6 +1713,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 52314d3aa1..2306602a51 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v16-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v16-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From 11e078ea7c7678151d75d5afbf9d0623969c35ad Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v16 3/5] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v16-0004-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v16-0004-Add-Result-Cache-executor-node.patchDownload

From ecb53aa82787f783e680fca4be27954b051ace3d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v16 4/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  283 +++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2822 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..613c46f886 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..4a544a3ab5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a218d78bef..9794943f1c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4903,6 +4904,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..e42983da02 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1993,6 +1998,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3066,6 +3075,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 74ac59faa1..c6bffaf199 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 4543ac79ed..18cbfdaeac 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..4ff8000003
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 493a856745..bd6e4464d4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -947,6 +947,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -5006,6 +5033,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index afd281ab5a..555d3add61 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -845,6 +845,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1918,6 +1933,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3879,6 +3909,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4113,6 +4146,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c5e136e9c3..cee654cbc0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2207,6 +2207,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2895,6 +2915,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e2510235ef..cd4d76bcfd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4032,6 +4032,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b92c948588..9dfd0fb4ff 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2401,6 +2403,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4141,6 +4284,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..596c2a053c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,249 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * find_resultcache_hashop
+ *		Find theh hash equals operator for typeoid.
+ *
+ * 'rinfo' must be the RestrictInfo for the qual that we're looking up the
+ * hash equals operator for.
+ *
+ * The given rinfo may have been previously determined to be hash-joinable. In
+ * this case we can simply return the hashjoinoperator.  If the rinfo was not
+ * determined to be hash-joinable, these may still be valid for result cache.
+ * We just need to look and see if there's a valid hash operator for the given
+ * type.
+ */
+static inline Oid
+find_resultcache_hashop(RestrictInfo *rinfo, Oid typeoid)
+{
+	TypeCacheEntry *typentry;
+
+	/*
+	 * Since equality joins are common, it seems worth seeing if this is
+	 * already set to what we need.
+	 */
+	if (OidIsValid(rinfo->hashjoinoperator))
+		return rinfo->hashjoinoperator;
+
+	/* Reject the qual if there are volatile functions */
+	if (rinfo->has_volatile)
+		return InvalidOid;
+
+	/* Perform a manual lookup */
+	typentry = lookup_type_cache(typeoid, TYPECACHE_HASH_PROC |
+										  TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return InvalidOid;
+
+	return typentry->eq_opr;
+}
+
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if it's valid to use a ResultCache node to cache inner rows.
+ *
+ * Additionally we also fetch outer side exprs and check for valid hashable
+ * equality operator for each outer expr.  Returns true and sets the
+ *'param_exprs' and 'operators' output parameters if the caching is possible.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Oid			hasheqop;
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* We only support OpExprs with 2 args */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			/* see if there's a valid hash equals operator for this type */
+			hasheqop = find_resultcache_hashop(rinfo, exprType(expr));
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(hasheqop))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, hasheqop);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+		TypeCacheEntry *typentry;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			bms_free(var_relids);
+			return false;
+		}
+		bms_free(var_relids);
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo))
+			return false;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1725,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1734,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1904,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1929,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..5d0e908d05 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -276,6 +279,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -451,6 +459,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1524,6 +1537,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6442,6 +6505,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7028,6 +7113,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7073,6 +7159,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..9c166f621d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -751,6 +751,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..1ad44e6ead 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2754,6 +2754,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d5c66780ac..3f654e1155 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3876,6 +3926,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4094,6 +4155,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..e1425270df 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1019,6 +1019,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f46c2dd7a8..1f54e1c2f4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..a71b0e5242 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1999,6 +2000,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e22df890ef..f0b3cc54f0 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 84e2fe186d..c58bd121c4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1488,6 +1488,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 95292d7573..678f53a807 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -775,6 +775,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1be93be098..67f925e793 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..fe8a2dbd39 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 2c818d9253..dcdb7526a4 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6d048e309c..a243b862d0 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -110,10 +110,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e280198b17..585814ad9e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,7 +115,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6a57e889a1..577e173d32 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -201,6 +201,7 @@ test: partition_aggregate
 test: partition_info
 test: tuplesort
 test: explain
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index f9579af19a..287acbf694 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v16-0005-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v16-0005-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From de660227a3c8c80d198fe186c9fc6e81e9e30780 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v16 5/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 4ff8000003..4d6cd9ecfe 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

#97

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#96)

5 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 15 Mar 2021 at 23:57, David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Mar 2021 at 14:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm -1 on doing it exactly that way, because you're expending
the cost of those lookups without certainty that you need the answer.
I had in mind something more like the way that we cache selectivity
estimates in RestrictInfo, in which the value is cached when first
demanded and then re-used on subsequent checks --- see in
clause_selectivity_ext, around line 750. You do need a way for the
field to have a "not known yet" value, but that's not hard. Moreover,
this sort of approach can be less invasive than what you did here,
because the caching behavior can be hidden inside
contain_volatile_functions, rather than having all the call sites
know about it explicitly.

I coded up something more along the lines of what I think you had in
mind for the 0001 patch.

I've now cleaned up the 0001 patch. I ended up changing a few places
where we pass the RestrictInfo->clause to contain_volatile_functions()
to instead pass the RestrictInfo itself so that there's a possibility
of caching the volatility property for a subsequent call.

I also made a pass over the remaining patches and for the 0004 patch,
aside from the name, "Result Cache", I think that it's ready to go. We
should consider before RC1 if we should have enable_resultcache switch
on or off by default.

Does anyone care to have a final look at these patches? I'd like to
start pushing them fairly soon.

David

Attachments:

v17-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchtext/plain; charset=US-ASCII; name=v17-0001-Cache-PathTarget-and-RestrictInfo-s-volatility.patchDownload

From f256cc2d810dd3247374b6a20d9c15eb1c9b01ea Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Wed, 10 Mar 2021 22:57:33 +1300
Subject: [PATCH v17 1/5] Cache PathTarget and RestrictInfo's volatility

This aims to can reduce the number of times we make calls to
contain_volatile_functions().  This really does not save us much with the
existing set of calls to contain_volatile_functions(), however, it will
save a significant number of calls in an upcoming patch which must check
this during the join search.
---
 src/backend/nodes/copyfuncs.c             |  1 +
 src/backend/nodes/outfuncs.c              |  2 +
 src/backend/optimizer/path/allpaths.c     | 40 ++++++++++---------
 src/backend/optimizer/plan/initsplan.c    |  4 +-
 src/backend/optimizer/util/clauses.c      | 47 +++++++++++++++++++++++
 src/backend/optimizer/util/restrictinfo.c |  7 ++++
 src/backend/optimizer/util/tlist.c        | 17 ++++++++
 src/include/nodes/pathnodes.h             | 16 ++++++++
 8 files changed, 114 insertions(+), 20 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2c20541e92..a3d046794e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2310,6 +2310,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_SCALAR_FIELD(can_join);
 	COPY_SCALAR_FIELD(pseudoconstant);
 	COPY_SCALAR_FIELD(leakproof);
+	COPY_SCALAR_FIELD(has_volatile);
 	COPY_SCALAR_FIELD(security_level);
 	COPY_BITMAPSET_FIELD(clause_relids);
 	COPY_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 305311d4a7..8b04f7be74 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2473,6 +2473,7 @@ _outPathTarget(StringInfo str, const PathTarget *node)
 	WRITE_FLOAT_FIELD(cost.startup, "%.2f");
 	WRITE_FLOAT_FIELD(cost.per_tuple, "%.2f");
 	WRITE_INT_FIELD(width);
+	WRITE_ENUM_FIELD(has_volatile_expr, VolatileFunctionStatus);
 }
 
 static void
@@ -2497,6 +2498,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_BOOL_FIELD(can_join);
 	WRITE_BOOL_FIELD(pseudoconstant);
 	WRITE_BOOL_FIELD(leakproof);
+	WRITE_ENUM_FIELD(has_volatile, VolatileFunctionStatus);
 	WRITE_UINT_FIELD(security_level);
 	WRITE_BITMAPSET_FIELD(clause_relids);
 	WRITE_BITMAPSET_FIELD(required_relids);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..59f495d743 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -134,7 +134,8 @@ static void check_output_expressions(Query *subquery,
 static void compare_tlist_datatypes(List *tlist, List *colTypes,
 									pushdown_safety_info *safetyInfo);
 static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
-static bool qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+static bool qual_is_pushdown_safe(Query *subquery, Index rti,
+								  RestrictInfo *rinfo,
 								  pushdown_safety_info *safetyInfo);
 static void subquery_push_qual(Query *subquery,
 							   RangeTblEntry *rte, Index rti, Node *qual);
@@ -2177,11 +2178,12 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		foreach(l, rel->baserestrictinfo)
 		{
 			RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
-			Node	   *clause = (Node *) rinfo->clause;
 
 			if (!rinfo->pseudoconstant &&
-				qual_is_pushdown_safe(subquery, rti, clause, &safetyInfo))
+				qual_is_pushdown_safe(subquery, rti, rinfo, &safetyInfo))
 			{
+				Node	   *clause = (Node *) rinfo->clause;
+
 				/* Push it down */
 				subquery_push_qual(subquery, rte, rti, clause);
 			}
@@ -3390,37 +3392,39 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
 }
 
 /*
- * qual_is_pushdown_safe - is a particular qual safe to push down?
+ * qual_is_pushdown_safe - is a particular rinfo safe to push down?
  *
- * qual is a restriction clause applying to the given subquery (whose RTE
+ * rinfo is a restriction clause applying to the given subquery (whose RTE
  * has index rti in the parent query).
  *
  * Conditions checked here:
  *
- * 1. The qual must not contain any SubPlans (mainly because I'm not sure
- * it will work correctly: SubLinks will already have been transformed into
- * SubPlans in the qual, but not in the subquery).  Note that SubLinks that
- * transform to initplans are safe, and will be accepted here because what
- * we'll see in the qual is just a Param referencing the initplan output.
+ * 1. rinfo's clause must not contain any SubPlans (mainly because it's
+ * unclear that it will work correctly: SubLinks will already have been
+ * transformed into SubPlans in the qual, but not in the subquery).  Note that
+ * SubLinks that transform to initplans are safe, and will be accepted here
+ * because what we'll see in the qual is just a Param referencing the initplan
+ * output.
  *
- * 2. If unsafeVolatile is set, the qual must not contain any volatile
+ * 2. If unsafeVolatile is set, rinfo's clause must not contain any volatile
  * functions.
  *
- * 3. If unsafeLeaky is set, the qual must not contain any leaky functions
- * that are passed Var nodes, and therefore might reveal values from the
- * subquery as side effects.
+ * 3. If unsafeLeaky is set, rinfo's clause must not contain any leaky
+ * functions that are passed Var nodes, and therefore might reveal values from
+ * the subquery as side effects.
  *
- * 4. The qual must not refer to the whole-row output of the subquery
+ * 4. rinfo's clause must not refer to the whole-row output of the subquery
  * (since there is no easy way to name that within the subquery itself).
  *
- * 5. The qual must not refer to any subquery output columns that were
+ * 5. rinfo's clause must not refer to any subquery output columns that were
  * found to be unsafe to reference by subquery_is_pushdown_safe().
  */
 static bool
-qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
+qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo,
 					  pushdown_safety_info *safetyInfo)
 {
 	bool		safe = true;
+	Node	   *qual = (Node *) rinfo->clause;
 	List	   *vars;
 	ListCell   *vl;
 
@@ -3430,7 +3434,7 @@ qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
 
 	/* Refuse volatile quals if we found they'd be unsafe (point 2) */
 	if (safetyInfo->unsafeVolatile &&
-		contain_volatile_functions(qual))
+		contain_volatile_functions((Node *) rinfo))
 		return false;
 
 	/* Refuse leaky quals if told to (point 3) */
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..20df2152ea 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2657,7 +2657,7 @@ check_mergejoinable(RestrictInfo *restrictinfo)
 	leftarg = linitial(((OpExpr *) clause)->args);
 
 	if (op_mergejoinable(opno, exprType(leftarg)) &&
-		!contain_volatile_functions((Node *) clause))
+		!contain_volatile_functions((Node *) restrictinfo))
 		restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno);
 
 	/*
@@ -2694,6 +2694,6 @@ check_hashjoinable(RestrictInfo *restrictinfo)
 	leftarg = linitial(((OpExpr *) clause)->args);
 
 	if (op_hashjoinable(opno, exprType(leftarg)) &&
-		!contain_volatile_functions((Node *) clause))
+		!contain_volatile_functions((Node *) restrictinfo))
 		restrictinfo->hashjoinoperator = opno;
 }
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index c6be4f87c2..d2c13b5e6e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -487,6 +487,53 @@ contain_volatile_functions_walker(Node *node, void *context)
 		return true;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		if (rinfo->has_volatile == VOLATILITY_NOVOLATILE)
+			return false;
+		else if (rinfo->has_volatile == VOLATILITY_VOLATILE)
+			return true;
+		else
+		{
+			bool hasvolatile;
+
+			hasvolatile = contain_volatile_functions_walker((Node *) rinfo->clause,
+															context);
+			if (hasvolatile)
+				rinfo->has_volatile = VOLATILITY_VOLATILE;
+			else
+				rinfo->has_volatile = VOLATILITY_NOVOLATILE;
+
+			return hasvolatile;
+		}
+	}
+
+	if (IsA(node, PathTarget))
+	{
+		PathTarget *target = (PathTarget *) node;
+
+		if (target->has_volatile_expr == VOLATILITY_NOVOLATILE)
+			return false;
+		else if (target->has_volatile_expr == VOLATILITY_VOLATILE)
+			return true;
+		else
+		{
+			bool hasvolatile;
+
+			hasvolatile = contain_volatile_functions_walker((Node *) target->exprs,
+															context);
+
+			if (hasvolatile)
+				target->has_volatile_expr = VOLATILITY_VOLATILE;
+			else
+				target->has_volatile_expr = VOLATILITY_NOVOLATILE;
+
+			return hasvolatile;
+		}
+	}
+
 	/*
 	 * See notes in contain_mutable_functions_walker about why we treat
 	 * MinMaxExpr, XmlExpr, and CoerceToDomain as immutable, while
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index eb113d94c1..59ff35926e 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -137,6 +137,13 @@ make_restrictinfo_internal(PlannerInfo *root,
 	else
 		restrictinfo->leakproof = false;	/* really, "don't know" */
 
+	/*
+	 * Mark volatility as unknown.  The contain_volatile_functions function
+	 * will determine if there are any volatile functions when called for the
+	 * first time with this RestrictInfo.
+	 */
+	restrictinfo->has_volatile = VOLATILITY_UNKNOWN;
+
 	/*
 	 * If it's a binary opclause, set up left/right relids info. In any case
 	 * set up the total clause relids info.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 89853a0630..8a26288070 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -623,6 +623,13 @@ make_pathtarget_from_tlist(List *tlist)
 		i++;
 	}
 
+	/*
+	 * Mark volatility as unknown.  The contain_volatile_functions function
+	 * will determine if there are any volatile functions when called for the
+	 * first time with this PathTarget.
+	 */
+	target->has_volatile_expr = VOLATILITY_UNKNOWN;
+
 	return target;
 }
 
@@ -724,6 +731,16 @@ add_column_to_pathtarget(PathTarget *target, Expr *expr, Index sortgroupref)
 		target->sortgrouprefs = (Index *) palloc0(nexprs * sizeof(Index));
 		target->sortgrouprefs[nexprs - 1] = sortgroupref;
 	}
+
+	/*
+	 * Reset has_volatile_expr to UNKNOWN.  We just leave it up to
+	 * contain_volatile_functions to set this properly again.  Technically we
+	 * could save some effort here and just check the new Expr, but it seems
+	 * better to keep the logic for setting this flag in one location rather
+	 * than duplicating the logic here.
+	 */
+	if (target->has_volatile_expr == VOLATILITY_NOVOLATILE)
+		target->has_volatile_expr = VOLATILITY_UNKNOWN;
 }
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e4aed43538..d485b4207a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1056,6 +1056,17 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+/*
+ * VolatileFunctionStatus -- allows nodes to cache their
+ * contain_volatile_functions properties. VOLATILITY_UNKNOWN means not yet
+ * determined.
+ */
+typedef enum VolatileFunctionStatus
+{
+	VOLATILITY_UNKNOWN = 0,
+	VOLATILITY_VOLATILE,
+	VOLATILITY_NOVOLATILE
+} VolatileFunctionStatus;
 
 /*
  * PathTarget
@@ -1087,6 +1098,8 @@ typedef struct PathTarget
 	Index	   *sortgrouprefs;	/* corresponding sort/group refnos, or 0 */
 	QualCost	cost;			/* cost of evaluating the expressions */
 	int			width;			/* estimated avg width of result tuples */
+	VolatileFunctionStatus	has_volatile_expr;	/* indicates if exprs contain
+												 * any volatile functions. */
 } PathTarget;
 
 /* Convenience macro to get a sort/group refno from a PathTarget */
@@ -2017,6 +2030,9 @@ typedef struct RestrictInfo
 
 	bool		leakproof;		/* true if known to contain no leaked Vars */
 
+	VolatileFunctionStatus	has_volatile;	/* to indicate if clause contains
+											 * any volatile functions. */
+
 	Index		security_level; /* see comment above */
 
 	/* The set of relids (varnos) actually referenced in the clause: */
-- 
2.27.0

v17-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v17-0002-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From af7e71907ee9ff5641c805cab3fc120c0b62c939 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v17 2/5] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..ed33d819e7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3086,7 +3086,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c81e2cf244..5cca276a9d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1971,7 +1971,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 28b40dd905..f1d8c5f95b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3717,7 +3717,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3742,7 +3743,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3758,7 +3760,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4807,7 +4809,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..d5c66780ac 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1713,6 +1713,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 52314d3aa1..2306602a51 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3569,6 +3581,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v17-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v17-0003-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From 0c88f5829c6b415cbd76db0a02491f8b509993b3 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v17 3/5] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v17-0004-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v17-0004-Add-Result-Cache-executor-node.patchDownload

From 0998b6b299ec1b7078a03f65dfb322ee982a052b Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v17 4/5] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  284 +++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  153 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   78 ++
 43 files changed, 2823 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..613c46f886 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..4a544a3ab5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..27bc74e450 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4903,6 +4904,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..e42983da02 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1993,6 +1998,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3066,6 +3075,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 74ac59faa1..c6bffaf199 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 4543ac79ed..18cbfdaeac 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..35d802524c
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a3d046794e..f2826d61b5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -947,6 +947,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -5009,6 +5036,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8b04f7be74..8e1670361a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -845,6 +845,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1918,6 +1933,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3882,6 +3912,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4116,6 +4149,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 9b8f81c523..a161841985 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2208,6 +2208,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2896,6 +2916,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 59f495d743..8cada9b7fd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4032,6 +4032,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5cca276a9d..97255b5c44 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -141,6 +142,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2403,6 +2405,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4143,6 +4286,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..a5461f5d03 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,250 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * find_resultcache_hashop
+ *		Find theh hash equals operator for typeoid.
+ *
+ * 'rinfo' must be the RestrictInfo for the qual that we're looking up the
+ * hash equals operator for.
+ *
+ * The given rinfo may have been previously determined to be hash-joinable. In
+ * this case we can simply return the hashjoinoperator.  If the rinfo was not
+ * determined to be hash-joinable, these may still be valid for result cache.
+ * We just need to look and see if there's a valid hash operator for the given
+ * type.
+ */
+static inline Oid
+find_resultcache_hashop(RestrictInfo *rinfo, Oid typeoid)
+{
+	TypeCacheEntry *typentry;
+
+	/*
+	 * Since equality joins are common, it seems worth seeing if this is
+	 * already set to what we need.
+	 */
+	if (OidIsValid(rinfo->hashjoinoperator))
+		return rinfo->hashjoinoperator;
+
+	/* Reject the qual if there are volatile functions */
+	if (contain_volatile_functions((Node *) rinfo))
+		return InvalidOid;
+
+	/* Perform a manual lookup */
+	typentry = lookup_type_cache(typeoid, TYPECACHE_HASH_PROC |
+										  TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return InvalidOid;
+
+	return typentry->eq_opr;
+}
+
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if param_info and innerrel's lateral_vars can be hashed.
+ *		Returns true the hashing is possible, otherwise return false.
+ *
+ * Additionally we also collect the outer exprs and the hash operators for
+ * each parameter to innerrel.  These set in 'param_exprs' and 'operators'
+ * when we return true.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Oid			hasheqop;
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* We only support OpExprs with 2 args */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			/* see if there's a valid hash equals operator for this type */
+			hasheqop = find_resultcache_hashop(rinfo, exprType(expr));
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(hasheqop))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, hasheqop);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+		TypeCacheEntry *typentry;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			bms_free(var_relids);
+			return false;
+		}
+		bms_free(var_relids);
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo))
+			return false;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1726,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1735,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1905,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1930,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..5d0e908d05 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -276,6 +279,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -451,6 +459,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1524,6 +1537,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6442,6 +6505,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7028,6 +7113,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7073,6 +7159,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..9c166f621d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -751,6 +751,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..1ad44e6ead 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2754,6 +2754,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d5c66780ac..3f654e1155 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3876,6 +3926,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4094,6 +4155,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b36a31a47..2d1472eca7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1028,6 +1028,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 86425965d0..73730f0b74 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..a71b0e5242 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1999,6 +2000,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e22df890ef..f0b3cc54f0 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d485b4207a..1bae126a82 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1489,6 +1489,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 95292d7573..678f53a807 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -775,6 +775,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 22e6db96b6..e99ed99a57 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -58,6 +58,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..fe8a2dbd39 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1ae0e5d939..ca06d41dd0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..c8706110c3
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,153 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                          explain_resultcache                                           
+--------------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
+         Workers Planned: 2
+         Workers Launched: 2
+         ->  Partial Aggregate (actual rows=1 loops=3)
+               ->  Nested Loop (actual rows=333 loops=3)
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1 (actual rows=333 loops=3)
+                           Recheck Cond: (unique1 < 1000)
+                           Heap Blocks: exact=333
+                           ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache (actual rows=1 loops=1000)
+                           Cache Key: t1.twenty
+                           Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                                 Index Cond: (unique1 = t1.twenty)
+                                 Heap Fetches: 0
+(17 rows)
+
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a62bf5dc92..3b58039e3d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,10 +111,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(19 rows)
+(20 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 70c38309d7..74bd545958 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,7 +115,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index d81d04136c..c84c9cc2ad 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -202,6 +202,7 @@ test: partition_info
 test: tuplesort
 test: explain
 test: compression
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index eb53668299..eb80a2fe06 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..b352f21ba1
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,78 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Ensure the cache works as expected with a parallel scan.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+RESET min_parallel_table_scan_size;
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
-- 
2.27.0

v17-0005-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v17-0005-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From 4811d5521d8a9c26af97c6e817fa43d56d50e155 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v17 5/5] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 35d802524c..ac4a5d04e8 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

#98

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#97)

4 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 24 Mar 2021 at 00:42, David Rowley <dgrowleyml@gmail.com> wrote:

I've now cleaned up the 0001 patch. I ended up changing a few places
where we pass the RestrictInfo->clause to contain_volatile_functions()
to instead pass the RestrictInfo itself so that there's a possibility
of caching the volatility property for a subsequent call.

I also made a pass over the remaining patches and for the 0004 patch,
aside from the name, "Result Cache", I think that it's ready to go. We
should consider before RC1 if we should have enable_resultcache switch
on or off by default.

Does anyone care to have a final look at these patches? I'd like to
start pushing them fairly soon.

I've now pushed the 0001 patch to cache the volatility of PathTarget
and RestrictInfo.

I'll be looking at the remaining patches over the next few days.

Attached are a rebased set of patches on top of current master. The
only change is to the 0003 patch (was 0004) which had an unstable
regression test for parallel plan with a Result Cache. I've swapped
the unstable test for something that shouldn't fail randomly depending
on if a parallel worker did any work or not.

David

Attachments:

v18-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchtext/plain; charset=US-ASCII; name=v18-0001-Allow-estimate_num_groups-to-pass-back-further-d.patchDownload

From 4ea718cda18a71e475d9d2292913359d15bcb61d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 16:06:36 +1200
Subject: [PATCH v18 1/4] Allow estimate_num_groups() to pass back further
 details about the estimation

Here we add a new output parameter to estimate_num_groups() to allow it to
set a flags variable with some bits to allow it to pass back additional
details to the caller which may be useful for decision making.

For now, the only new flag is one which indicates if the estimation
fell back on using the hard-coded constants in any part of the estimation.
Callers may like to change their behavior if this is set, and this gives
them the ability to do so. Callers may pass the flag pointer as NULL if
they have no interest in any of the flags.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/optimizer/path/costsize.c  |  3 ++-
 src/backend/optimizer/path/indxpath.c  |  1 +
 src/backend/optimizer/plan/planner.c   | 10 ++++++----
 src/backend/optimizer/prep/prepunion.c |  1 +
 src/backend/optimizer/util/pathnode.c  |  1 +
 src/backend/utils/adt/selfuncs.c       | 22 +++++++++++++++++++++-
 src/include/utils/selfuncs.h           | 17 ++++++++++++++++-
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..ed33d819e7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3086,7 +3086,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			numGroups = estimate_num_groups(root,
 											get_sortgrouplist_exprs(root->parse->groupClause,
 																	fpinfo->grouped_tlist),
-											input_rows, NULL);
+											input_rows, NULL, NULL);
 
 			/*
 			 * Get the retrieved_rows and rows estimates.  If there are HAVING
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a25b674a19..b92c948588 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1969,7 +1969,8 @@ cost_incremental_sort(Path *path,
 
 	/* Estimate number of groups with equal presorted keys. */
 	if (!unknown_varno)
-		input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL);
+		input_groups = estimate_num_groups(root, presortedExprs, input_tuples,
+										   NULL, NULL);
 
 	group_tuples = input_tuples / input_groups;
 	group_input_run_cost = input_run_cost / input_groups;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..53b24e9e8c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1990,6 +1990,7 @@ adjust_rowcount_for_semijoins(PlannerInfo *root,
 			nunique = estimate_num_groups(root,
 										  sjinfo->semi_rhs_exprs,
 										  nraw,
+										  NULL,
 										  NULL);
 			if (rowcount > nunique)
 				rowcount = nunique;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f529d107d2..0886bf4ae8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3702,7 +3702,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					rollup->numGroups += numGroups;
@@ -3727,7 +3728,8 @@ get_number_of_groups(PlannerInfo *root,
 					double		numGroups = estimate_num_groups(root,
 																groupExprs,
 																path_rows,
-																&gset);
+																&gset,
+																NULL);
 
 					gs->numGroups = numGroups;
 					gd->dNumHashGroups += numGroups;
@@ -3743,7 +3745,7 @@ get_number_of_groups(PlannerInfo *root,
 												 target_list);
 
 			dNumGroups = estimate_num_groups(root, groupExprs, path_rows,
-											 NULL);
+											 NULL, NULL);
 		}
 	}
 	else if (parse->groupingSets)
@@ -4792,7 +4794,7 @@ create_distinct_paths(PlannerInfo *root,
 												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
-											  NULL);
+											  NULL, NULL);
 	}
 
 	/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index becdcbb872..037dfaacfd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -338,6 +338,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				*pNumGroups = estimate_num_groups(subroot,
 												  get_tlist_exprs(subquery->targetList, false),
 												  subpath->rows,
+												  NULL,
 												  NULL);
 		}
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..d5c66780ac 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1713,6 +1713,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.rows = estimate_num_groups(root,
 											  sjinfo->semi_rhs_exprs,
 											  rel->rows,
+											  NULL,
 											  NULL);
 	numCols = list_length(sjinfo->semi_rhs_exprs);
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7e41bc5641..0963e2701c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3241,6 +3241,7 @@ typedef struct
 	Node	   *var;			/* might be an expression, not just a Var */
 	RelOptInfo *rel;			/* relation it belongs to */
 	double		ndistinct;		/* # distinct values */
+	bool		isdefault;		/* true if DEFAULT_NUM_DISTINCT was used */
 } GroupVarInfo;
 
 static List *
@@ -3287,6 +3288,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 	varinfo->var = var;
 	varinfo->rel = vardata->rel;
 	varinfo->ndistinct = ndistinct;
+	varinfo->isdefault = isdefault;
 	varinfos = lappend(varinfos, varinfo);
 	return varinfos;
 }
@@ -3311,6 +3313,12 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  *	pgset - NULL, or a List** pointing to a grouping set to filter the
  *		groupExprs against
  *
+ * Outputs:
+ *	estinfo - When passed as non-NULL, the function will set bits in the
+ *		"flags" field in order to provide callers with additional information
+ *		about the estimation.  Currently, we only set the SELFLAG_USED_DEFAULT
+ *		bit if we used any default values in the estimation.
+ *
  * Given the lack of any cross-correlation statistics in the system, it's
  * impossible to do anything really trustworthy with GROUP BY conditions
  * involving multiple Vars.  We should however avoid assuming the worst
@@ -3358,7 +3366,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
  */
 double
 estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
-					List **pgset)
+					List **pgset, EstimationInfo *estinfo)
 {
 	List	   *varinfos = NIL;
 	double		srf_multiplier = 1.0;
@@ -3366,6 +3374,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* Zero the estinfo output parameter, if non-NULL */
+	if (estinfo != NULL)
+		memset(estinfo, 0, sizeof(EstimationInfo));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3577,6 +3589,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 					if (relmaxndistinct < varinfo2->ndistinct)
 						relmaxndistinct = varinfo2->ndistinct;
 					relvarcount++;
+
+					/*
+					 * When varinfo2's isdefault is set then we'd better set
+					 * the SELFLAG_USED_DEFAULT bit in the EstimationInfo.
+					 */
+					if (estinfo != NULL && varinfo2->isdefault)
+						estinfo->flags |= SELFLAG_USED_DEFAULT;
+
 				}
 
 				/* we're done with this relation */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f9be539602..78cde58acc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -68,6 +68,20 @@
 			p = 1.0; \
 	} while (0)
 
+/*
+ * A set of flags which some selectivity estimation functions can pass back to
+ * callers to provide further details about some assumptions which were made
+ * during the estimation.
+ */
+#define SELFLAG_USED_DEFAULT		(1 << 0)	/* Estimation fell back on one
+												 * of the DEFAULTs as defined
+												 * above. */
+
+typedef struct EstimationInfo
+{
+	uint32			flags;		/* Flags, as defined above to mark special
+								 * properties of the estimation. */
+} EstimationInfo;
 
 /* Return data from examine_variable and friends */
 typedef struct VariableStatData
@@ -197,7 +211,8 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 							 Selectivity *rightstart, Selectivity *rightend);
 
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
-								  double input_rows, List **pgset);
+								  double input_rows, List **pgset,
+								  EstimationInfo *estinfo);
 
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 									   Node *hashkey, double nbuckets,
-- 
2.27.0

v18-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchtext/plain; charset=US-ASCII; name=v18-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patchDownload

From 8a2bea784ddbd6655e8a829853547e5d2d87938d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:07:34 +1200
Subject: [PATCH v18 2/4] Allow users of simplehash.h to perform direct
 deletions

Previously simplehash.h only exposed a method to perform a hash table
delete by the key.  This required performing a hash table lookup in order
to find the element which belongs to that key.  Having the code this way
made sense for the existing callers, but in an up-coming commit, a caller
already has the element which it would like to delete, so can do so
without performing a lookup.
---
 src/include/lib/simplehash.h | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..da51781e98 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -110,6 +110,7 @@
 #define SH_RESET SH_MAKE_NAME(reset)
 #define SH_INSERT SH_MAKE_NAME(insert)
 #define SH_INSERT_HASH SH_MAKE_NAME(insert_hash)
+#define SH_DELETE_ITEM SH_MAKE_NAME(delete_item)
 #define SH_DELETE SH_MAKE_NAME(delete)
 #define SH_LOOKUP SH_MAKE_NAME(lookup)
 #define SH_LOOKUP_HASH SH_MAKE_NAME(lookup_hash)
@@ -217,6 +218,9 @@ SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
 
+/* void <prefix>_delete_item(<prefix>_hash *tb, <element> *entry) */
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+
 /* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
 
@@ -829,7 +833,7 @@ SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 }
 
 /*
- * Delete entry from hash table.  Returns whether to-be-deleted key was
+ * Delete entry from hash table by key.  Returns whether to-be-deleted key was
  * present.
  */
 SH_SCOPE bool
@@ -900,6 +904,61 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	}
 }
 
+/*
+ * Delete entry from hash table by entry pointer
+ */
+SH_SCOPE void
+SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+	SH_ELEMENT_TYPE *lastentry = entry;
+	uint32		hash = SH_ENTRY_HASH(tb, entry);
+	uint32		startelem = SH_INITIAL_BUCKET(tb, hash);
+	uint32		curelem;
+
+	/* Calculate the index of 'entry' */
+	curelem = entry - &tb->data[0];
+
+	tb->members--;
+
+	/*
+	 * Backward shift following elements till either an empty element or an
+	 * element at its optimal position is encountered.
+	 *
+	 * While that sounds expensive, the average chain length is short, and
+	 * deletions would otherwise require tombstones.
+	 */
+	while (true)
+	{
+		SH_ELEMENT_TYPE *curentry;
+		uint32		curhash;
+		uint32		curoptimal;
+
+		curelem = SH_NEXT(tb, curelem, startelem);
+		curentry = &tb->data[curelem];
+
+		if (curentry->status != SH_STATUS_IN_USE)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		curhash = SH_ENTRY_HASH(tb, curentry);
+		curoptimal = SH_INITIAL_BUCKET(tb, curhash);
+
+		/* current is at optimal position, done */
+		if (curoptimal == curelem)
+		{
+			lastentry->status = SH_STATUS_EMPTY;
+			break;
+		}
+
+		/* shift */
+		memcpy(lastentry, curentry, sizeof(SH_ELEMENT_TYPE));
+
+		lastentry = curentry;
+	}
+}
+
 /*
  * Initialize iterator.
  */
@@ -1102,6 +1161,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_RESET
 #undef SH_INSERT
 #undef SH_INSERT_HASH
+#undef SH_DELETE_ITEM
 #undef SH_DELETE
 #undef SH_LOOKUP
 #undef SH_LOOKUP_HASH
-- 
2.27.0

v18-0003-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v18-0003-Add-Result-Cache-executor-node.patchDownload

From d0b9d751bb9e412153afb9ae0a4465cc7fb83966 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v18 3/4] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can use this node to sit above a parameterized path in order to cache
the tuples for commonly used sets of parameters.

The cache itself is just a hash table which limits itself to not exceeding
work_mem in size.  We maintain a dlist of keys for this cache and when we
require more space in the table for new entries, we start removing entries
starting with the least recently used ones.

For parameterized nested loop joins we now consider using one of these
Result Caches in between the Nested Loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Knowing this
relies on having good ndistinct estimates on the nested loop parameters.

Effectively, for parameterized nested loops, when a Result Cache is used,
the join becomes a sort of hybrid of nested loop and hash joins.  This is
useful as we only need to fill the hash table (the cache) with the records
that are required during the "probe" phase.  We'll never end up hashing
anything that we don't need, which is especially useful when some items in
the table will never be looked up and hash join's hash table would have
exceeded work_mem.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   23 +-
 src/backend/commands/explain.c                |  148 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1128 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  284 +++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 ++
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   30 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  238 ++--
 src/test/regress/expected/resultcache.out     |  159 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   85 ++
 43 files changed, 2836 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..613c46f886 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..4a544a3ab5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..44ccf2153f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4909,6 +4910,24 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of a result cache node for
+        parameterized nodes.  This node type allows scans to the underlying
+        nodes to be skipped when the results for the current parameters are
+        already in the cache.  Less commonly looked up results may be evicted
+        from the cache when more space is required for new entries.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..e42983da02 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1993,6 +1998,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3066,6 +3075,145 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	/* Show details from parallel workers, if any */
+	if (rcstate->shared_info != NULL)
+	{
+		for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+		{
+			ResultCacheInstrumentation *si;
+
+			si = &rcstate->shared_info->sinstrument[n];
+
+			/*
+			 * Skip workers that didn't do any work.  We needn't bother
+			 * checking for cache hits as a miss will always occur before
+			 * a cache hit.
+			 */
+			if (si->cache_misses == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			/*
+			 * Since the worker's ResultCacheState.mem_used field is
+			 * unavailable to us, ExecEndResultCache will have set the
+			 * ResultCacheInstrumentation.mem_peak field for us.  No need to
+			 * do the zero checks like we did for the serial case above.
+			 */
+			memPeakKb = (si->mem_peak + 1023) / 1024;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+								 si->cache_hits, si->cache_misses,
+								 si->cache_evictions, si->cache_overflows,
+								 memPeakKb);
+			}
+			else
+			{
+				ExplainPropertyInteger("Cache Hits", NULL,
+									   si->cache_hits, es);
+				ExplainPropertyInteger("Cache Misses", NULL,
+									   si->cache_misses, es);
+				ExplainPropertyInteger("Cache Evictions", NULL,
+									   si->cache_evictions, es);
+				ExplainPropertyInteger("Cache Overflows", NULL,
+									   si->cache_overflows, es);
+				ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+									   es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 74ac59faa1..c6bffaf199 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 4543ac79ed..18cbfdaeac 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..35d802524c
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1128 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.  The intention here is that
+ * a repeat scan with a parameter which has already been seen by the node can
+ * fetch tuples from the cache rather than having to re-scan the outer node
+ * all over again.  The query planner may choose to make use of one of these
+ * when it thinks rescans for previously seen values are likely enough to
+ * warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+#ifdef CACHE_VERIFY_TABLE
+	/* Can be enabled to validate the memory tracking code is behaving */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		rcstate->stats.cache_evictions += 1;	/* Update Stats */
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+				else
+				{
+					TupleTableSlot *outerslot;
+
+					node->stats.cache_misses += 1;	/* stats update */
+
+					if (found)
+					{
+						/*
+						 * A cache entry was found, but the scan for that
+						 * entry did not run to completion.  We'll just remove
+						 * all tuples and start again.  It might be tempting
+						 * to continue where we left off, but there's no
+						 * guarantee the outer node will produce the tuples in
+						 * the same order as it did last time.
+						 */
+						entry_purge_tuples(node, entry);
+					}
+
+					/* Scan the outer node for a tuple to cache */
+					outerNode = outerPlanState(node);
+					outerslot = ExecProcNode(outerNode);
+					if (TupIsNull(outerslot))
+					{
+						/*
+						 * cache_lookup may have returned NULL due to failure
+						 * to free enough cache space, so ensure we don't do
+						 * anything here that assumes it worked. There's no
+						 * need to go into bypass mode here as we're setting
+						 * rc_status to end of scan.
+						 */
+						if (likely(entry))
+							entry->complete = true;
+
+						node->rc_status = RC_END_OF_SCAN;
+						return NULL;
+					}
+
+					node->entry = entry;
+
+					/*
+					 * If we failed to create the entry or failed to store the
+					 * tuple in the entry, then go into bypass mode.
+					 */
+					if (unlikely(entry == NULL ||
+								 !cache_store_tuple(node, outerslot)))
+					{
+						node->stats.cache_overflows += 1;	/* stats update */
+
+						node->rc_status = RC_CACHE_BYPASS_MODE;
+
+						/*
+						 * No need to clear out last_tuple as we'll stay in
+						 * bypass mode until the end of the scan.
+						 */
+					}
+					else
+					{
+						/*
+						 * If we only expect a single row from this scan then
+						 * we can mark that we're not expecting more.  This
+						 * allows cache lookups to work even when the scan has
+						 * not been executed to completion.
+						 */
+						entry->complete = node->singlerow;
+						node->rc_status = RC_FILLING_CACHE;
+					}
+
+					slot = node->ss.ps.ps_ResultTupleSlot;
+					ExecCopySlot(slot, outerslot);
+					return slot;
+				}
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX is this worth the check?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something call us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1d0bb6e2e7..5580de2188 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -946,6 +946,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -5020,6 +5047,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 301fa30490..547083c5b0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -844,6 +844,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1917,6 +1932,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3889,6 +3919,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4123,6 +4156,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 377185f7c6..c6955465d4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2207,6 +2207,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2895,6 +2915,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 59f495d743..8cada9b7fd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4032,6 +4032,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b92c948588..9dfd0fb4ff 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2401,6 +2403,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see the same parameter values twice, in which case we'd never get a
+ * cache hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4141,6 +4284,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..a5461f5d03 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,250 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * find_resultcache_hashop
+ *		Find theh hash equals operator for typeoid.
+ *
+ * 'rinfo' must be the RestrictInfo for the qual that we're looking up the
+ * hash equals operator for.
+ *
+ * The given rinfo may have been previously determined to be hash-joinable. In
+ * this case we can simply return the hashjoinoperator.  If the rinfo was not
+ * determined to be hash-joinable, these may still be valid for result cache.
+ * We just need to look and see if there's a valid hash operator for the given
+ * type.
+ */
+static inline Oid
+find_resultcache_hashop(RestrictInfo *rinfo, Oid typeoid)
+{
+	TypeCacheEntry *typentry;
+
+	/*
+	 * Since equality joins are common, it seems worth seeing if this is
+	 * already set to what we need.
+	 */
+	if (OidIsValid(rinfo->hashjoinoperator))
+		return rinfo->hashjoinoperator;
+
+	/* Reject the qual if there are volatile functions */
+	if (contain_volatile_functions((Node *) rinfo))
+		return InvalidOid;
+
+	/* Perform a manual lookup */
+	typentry = lookup_type_cache(typeoid, TYPECACHE_HASH_PROC |
+										  TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return InvalidOid;
+
+	return typentry->eq_opr;
+}
+
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if param_info and innerrel's lateral_vars can be hashed.
+ *		Returns true the hashing is possible, otherwise return false.
+ *
+ * Additionally we also collect the outer exprs and the hash operators for
+ * each parameter to innerrel.  These set in 'param_exprs' and 'operators'
+ * when we return true.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							List **param_exprs, List **operators,
+							RelOptInfo *outerrel, RelOptInfo *innerrel)
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Oid			hasheqop;
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* We only support OpExprs with 2 args */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			/* see if there's a valid hash equals operator for this type */
+			hasheqop = find_resultcache_hashop(rinfo, exprType(expr));
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(hasheqop))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, hasheqop);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+		TypeCacheEntry *typentry;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			bms_free(var_relids);
+			return false;
+		}
+		bms_free(var_relids);
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo))
+			return false;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									&param_exprs,
+									&hash_operators,
+									outerrel,
+									innerrel))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1726,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1735,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1905,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1930,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..5d0e908d05 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -276,6 +279,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -451,6 +459,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1524,6 +1537,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6442,6 +6505,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7028,6 +7113,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7073,6 +7159,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..9c166f621d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -751,6 +751,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..1ad44e6ead 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2754,6 +2754,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d5c66780ac..3f654e1155 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3876,6 +3926,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4094,6 +4155,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0c5dc4d3e8..032336d78b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1036,6 +1036,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b234a6bfe6..b3a80b8c6d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..3ffca841c5
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..a71b0e5242 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1999,6 +2000,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 299956f329..01761374dd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d2d3643bea..07066c3c44 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1488,6 +1488,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6e62104d0b..04c111d6dd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -773,6 +773,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1be93be098..67f925e793 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..fe8a2dbd39 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1ae0e5d939..ca06d41dd0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..8c29e22d76 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2136,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(30 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2172,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(30 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2208,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                        
+-------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2245,29 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..79a1114b5c
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,159 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Nested Loop
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1
+                           Recheck Cond: (unique1 < 1000)
+                           ->  Bitmap Index Scan on tenk1_unique1
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache
+                           Cache Key: t1.twenty
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2
+                                 Index Cond: (unique1 = t1.twenty)
+(13 rows)
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6d048e309c..a243b862d0 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -110,10 +110,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 312c11a4bd..2e89839089 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5a80bfacd8..a46f3d0178 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -203,6 +203,7 @@ test: partition_info
 test: tuplesort
 test: explain
 test: compression
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index eb53668299..eb80a2fe06 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..150820449c
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,85 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
-- 
2.27.0

v18-0004-Remove-code-duplication-in-nodeResultCache.c.patchtext/plain; charset=US-ASCII; name=v18-0004-Remove-code-duplication-in-nodeResultCache.c.patchDownload

From ddc5e27a2a2c1c40244c4ccebf9f45af22a587ef Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 8 Dec 2020 17:54:04 +1300
Subject: [PATCH v18 4/4] Remove code duplication in nodeResultCache.c

---
 src/backend/executor/nodeResultCache.c | 123 ++++++++++---------------
 1 file changed, 51 insertions(+), 72 deletions(-)

diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 35d802524c..ac4a5d04e8 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -425,6 +425,54 @@ cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
 	return specialkey_intact;
 }
 
+/*
+ * cache_check_mem
+ *		Check if we've allocated more than our memory budget and, if so,
+ *		reduce the memory used by the cache.  Returns the cache entry
+ *		belonging to 'entry', which may have changed address by shuffling the
+ *		deleted entries back to their optimal position.  Returns NULL if the
+ *		attempt to free enough memory resulted in 'entry' itself being evicted
+ *		from the cache.
+ */
+static ResultCacheEntry *
+cache_check_mem(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
 /*
  * cache_lookup
  *		Perform a lookup to see if we've already cached results based on the
@@ -487,44 +535,7 @@ cache_lookup(ResultCacheState *rcstate, bool *found)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget, then we'll free up some space in
-	 * the cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		/*
-		 * Try to free up some memory.  It's highly unlikely that we'll fail
-		 * to do so here since the entry we've just added is yet to contain
-		 * any tuples and we're able to remove any other entry to reduce the
-		 * memory consumption.
-		 */
-		if (unlikely(!cache_reduce_memory(rcstate, key)))
-			return NULL;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the newly added entry */
-			entry = resultcache_lookup(rcstate->hashtable, NULL);
-			Assert(entry != NULL);
-		}
-	}
-
-	return entry;
+	return cache_check_mem(rcstate, entry);
 }
 
 /*
@@ -570,41 +581,9 @@ cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
 	rcstate->last_tuple = tuple;
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * If we've gone over our memory budget then free up some space in the
-	 * cache.
-	 */
-	if (rcstate->mem_used > rcstate->mem_limit)
-	{
-		ResultCacheKey *key = entry->key;
-
-		if (!cache_reduce_memory(rcstate, key))
-			return false;
-
-		/*
-		 * The process of removing entries from the cache may have caused the
-		 * code in simplehash.h to shuffle elements to earlier buckets in the
-		 * hash table.  If it has, we'll need to find the entry again by
-		 * performing a lookup.  Fortunately, we can detect if this has
-		 * happened by seeing if the entry is still in use and that the key
-		 * pointer matches our expected key.
-		 */
-		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
-		{
-			/*
-			 * We need to repopulate the probeslot as lookups performed during
-			 * the cache evictions above will have stored some other key.
-			 */
-			prepare_probe_slot(rcstate, key);
-
-			/* Re-find the entry */
-			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
-														NULL);
-			Assert(entry != NULL);
-		}
-	}
+	rcstate->entry = entry = cache_check_mem(rcstate, entry);
 
-	return true;
+	return (entry != NULL);
 }
 
 static TupleTableSlot *
-- 
2.27.0

#99

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: David Rowley (#98)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,
For show_resultcache_info()

+ if (rcstate->shared_info != NULL)
+ {

The negated condition can be used with a return. This way, the loop can be
unindented.

+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.

Since the parameterized node is singular, it would be nice if 'them' can be
expanded to refer to the source of result cache.

+ rcstate->mem_used -= freed_mem;

Should there be assertion that after the subtraction, mem_used stays
non-negative ?

+               if (found && entry->complete)
+               {
+                   node->stats.cache_hits += 1;    /* stats update */

Once inside the if block, we would return.
+               else
+               {
The else block can be unindented (dropping else keyword).

+                * return 1 row.  XXX is this worth the check?
+                */
+               if (unlikely(entry->complete))

Since the check is on a flag (with minimal overhead), it seems the check
can be kept, with the question removed.

Cheers

On Sun, Mar 28, 2021 at 7:21 PM David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Wed, 24 Mar 2021 at 00:42, David Rowley <dgrowleyml@gmail.com> wrote:

I've now cleaned up the 0001 patch. I ended up changing a few places
where we pass the RestrictInfo->clause to contain_volatile_functions()
to instead pass the RestrictInfo itself so that there's a possibility
of caching the volatility property for a subsequent call.

I also made a pass over the remaining patches and for the 0004 patch,
aside from the name, "Result Cache", I think that it's ready to go. We
should consider before RC1 if we should have enable_resultcache switch
on or off by default.

Does anyone care to have a final look at these patches? I'd like to
start pushing them fairly soon.

I've now pushed the 0001 patch to cache the volatility of PathTarget
and RestrictInfo.

I'll be looking at the remaining patches over the next few days.

Attached are a rebased set of patches on top of current master. The
only change is to the 0003 patch (was 0004) which had an unstable
regression test for parallel plan with a Result Cache. I've swapped
the unstable test for something that shouldn't fail randomly depending
on if a parallel worker did any work or not.

David

#100

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#99)

1 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Mon, 29 Mar 2021 at 15:56, Zhihong Yu <zyu@yugabyte.com> wrote:

For show_resultcache_info()

+ if (rcstate->shared_info != NULL)
+ {

The negated condition can be used with a return. This way, the loop can be unindented.

OK. I change that.

+ * ResultCache nodes are intended to sit above a parameterized node in the
+ * plan tree in order to cache results from them.
Since the parameterized node is singular, it would be nice if 'them' can be expanded to refer to the source of result cache.

I've done a bit of rewording in that paragraph.

+ rcstate->mem_used -= freed_mem;

Should there be assertion that after the subtraction, mem_used stays non-negative ?

I'm not sure. I ended up adding one and also adjusting the #ifdef in
remove_cache_entry() which had some code to validate the memory
accounting so that it compiles when USE_ASSERT_CHECKING is defined.
I'm unsure if that's a bit too expensive to enable during debugs but I
didn't really want to leave the code in there unless it's going to get
some exercise on the buildfarm.

+               if (found && entry->complete)
+               {
+                   node->stats.cache_hits += 1;    /* stats update */

Once inside the if block, we would return.

OK change.

+               else
+               {
The else block can be unindented (dropping else keyword).

changed.

+                * return 1 row.  XXX is this worth the check?
+                */
+               if (unlikely(entry->complete))
Since the check is on a flag (with minimal overhead), it seems the check can be kept, with the question removed.

I changed the comment, but I did leave a mention that I'm still not
sure if it should be an Assert() or an elog.

The attached patch is an updated version of the Result Cache patch
containing the changes for the things you highlighted plus a few other
things.

I pushed the change to simplehash.h and the estimate_num_groups()
change earlier, so only 1 patch remaining.

Also, I noticed the CFBof found another unstable parallel regression
test. This was due to some code in show_resultcache_info() which
skipped parallel workers that appeared to not help out. It looks like
on my machine the worker never got a chance to do anything, but on one
of the CFbot's machines, it did. I ended up changing the EXPLAIN
output so that it shows the cache statistics regardless of if the
worker helped or not.

David

Attachments:

v19-0001-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v19-0001-Add-Result-Cache-executor-node.patchDownload

From 92e7cd5eaa55e78e20571721928949720f786dfc Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v19] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, Result Cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
Result Cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the Result Cache's
hash table is much smaller than the hash join's due to result cache only
caching useful values rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
Result Cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a Result Cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a Result
Cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a Result Cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for Result Caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a Result Cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   24 +-
 src/backend/commands/explain.c                |  140 ++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1137 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   30 +
 src/backend/nodes/outfuncs.c                  |   36 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  285 +++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   31 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  243 ++--
 src/test/regress/expected/resultcache.out     |  159 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   85 ++
 43 files changed, 2845 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..613c46f886 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..4a544a3ab5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..1bc82406d9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4909,6 +4910,25 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of result cache plans for
+        caching results from parameterized scans inside nested-loop joins.
+        This plan type allows scans to the underlying plans to be skipped when
+        the results for the current parameters are already in the cache.  Less
+        commonly looked up results may be evicted from the cache when more
+        space is required for new entries. The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..70b03ea0a8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1993,6 +1998,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3066,6 +3075,137 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+		ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+		ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+		ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str,
+						 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+						 rcstate->stats.cache_hits,
+						 rcstate->stats.cache_misses,
+						 rcstate->stats.cache_evictions,
+						 rcstate->stats.cache_overflows,
+						 memPeakKb);
+	}
+
+	if (rcstate->shared_info == NULL)
+		return;
+
+	/* Show details from parallel workers */
+	for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+	{
+		ResultCacheInstrumentation *si;
+
+		si = &rcstate->shared_info->sinstrument[n];
+
+		if (es->workers_state)
+			ExplainOpenWorker(n, es);
+
+		/*
+		 * Since the worker's ResultCacheState.mem_used field is unavailable
+		 * to us, ExecEndResultCache will have set the
+		 * ResultCacheInstrumentation.mem_peak field for us.  No need to do
+		 * the zero checks like we did for the serial case above.
+		 */
+		memPeakKb = (si->mem_peak + 1023) / 1024;
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+							 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+							 si->cache_hits, si->cache_misses,
+							 si->cache_evictions, si->cache_overflows,
+							 memPeakKb);
+		}
+		else
+		{
+			ExplainPropertyInteger("Cache Hits", NULL,
+								   si->cache_hits, es);
+			ExplainPropertyInteger("Cache Misses", NULL,
+								   si->cache_misses, es);
+			ExplainPropertyInteger("Cache Evictions", NULL,
+								   si->cache_evictions, es);
+			ExplainPropertyInteger("Cache Overflows", NULL,
+								   si->cache_overflows, es);
+			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+								   es);
+		}
+
+		if (es->workers_state)
+			ExplainCloseWorker(n, es);
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 74ac59faa1..c6bffaf199 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 4543ac79ed..18cbfdaeac 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..d68b8c23a7 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3496,3 +3496,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..906b68c945
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1137 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above parameterized nodes in the plan
+ * tree in order to cache results from them.  The intention here is that a
+ * repeat scan with a parameter value that has already been seen by the node
+ * can fetch tuples from the cache rather than having to re-scan the outer
+ * node all over again.  The query planner may choose to make use of one of
+ * these when it thinks rescans for previously seen values are likely enough
+ * to warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+
+	Assert(rcstate->mem_used >= 0);
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+#ifdef USE_ASSERT_CHECKING
+	/*
+	 * Validate the memory accounting code is correct in assert builds. XXX is
+	 * this too expensive for USE_ASSERT_CHECKING?
+	 */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	Assert(rcstate->mem_used >= 0);
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+	uint64		evictions = 0;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		evictions++;
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	rcstate->stats.cache_evictions += evictions;	/* Update Stats */
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				TupleTableSlot *outerslot;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/* Handle cache miss */
+				node->stats.cache_misses += 1;	/* stats update */
+
+				if (found)
+				{
+					/*
+					 * A cache entry was found, but the scan for that entry
+					 * did not run to completion.  We'll just remove all
+					 * tuples and start again.  It might be tempting to
+					 * continue where we left off, but there's no guarantee
+					 * the outer node will produce the tuples in the same
+					 * order as it did last time.
+					 */
+					entry_purge_tuples(node, entry);
+				}
+
+				/* Scan the outer node for a tuple to cache */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/*
+					 * cache_lookup may have returned NULL due to failure to
+					 * free enough cache space, so ensure we don't do anything
+					 * here that assumes it worked. There's no need to go into
+					 * bypass mode here as we're setting rc_status to end of
+					 * scan.
+					 */
+					if (likely(entry))
+						entry->complete = true;
+
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				node->entry = entry;
+
+				/*
+				 * If we failed to create the entry or failed to store the
+				 * tuple in the entry, then go into bypass mode.
+				 */
+				if (unlikely(entry == NULL ||
+					!cache_store_tuple(node, outerslot)))
+				{
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out last_tuple as we'll stay in bypass
+					 * mode until the end of the scan.
+					 */
+				}
+				else
+				{
+					/*
+					 * If we only expect a single row from this scan then we
+					 * can mark that we're not expecting more.  This allows
+					 * cache lookups to work even when the scan has not been
+					 * executed to completion.
+					 */
+					entry->complete = node->singlerow;
+					node->rc_status = RC_FILLING_CACHE;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX maybe this should be an Assert?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something calls us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info != NULL && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1d0bb6e2e7..5580de2188 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -946,6 +946,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -5020,6 +5047,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 301fa30490..547083c5b0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -844,6 +844,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1917,6 +1932,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -3889,6 +3919,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4123,6 +4156,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 377185f7c6..c6955465d4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2207,6 +2207,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2895,6 +2915,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 59f495d743..8cada9b7fd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4032,6 +4032,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b92c948588..8994e53643 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2401,6 +2403,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see any parameter value twice, in which case we'd never get a cache
+ * hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4141,6 +4284,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..67289d1806 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,251 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * find_resultcache_hashop
+ *		Find theh hash equals operator for typeoid.
+ *
+ * 'rinfo' must be the RestrictInfo for the qual that we're looking up the
+ * hash equals operator for.
+ *
+ * The given rinfo may have been previously determined to be hash-joinable. In
+ * this case we can simply return the hashjoinoperator.  If the rinfo was not
+ * determined to be hash-joinable, these may still be valid for result cache.
+ * We just need to look and see if there's a valid hash operator for the given
+ * type.
+ */
+static inline Oid
+find_resultcache_hashop(RestrictInfo *rinfo, Oid typeoid)
+{
+	TypeCacheEntry *typentry;
+
+	/*
+	 * Since equality joins are common, it seems worth seeing if this is
+	 * already set to what we need.
+	 */
+	if (OidIsValid(rinfo->hashjoinoperator))
+		return rinfo->hashjoinoperator;
+
+	/* Reject the qual if there are volatile functions */
+	if (contain_volatile_functions((Node *) rinfo))
+		return InvalidOid;
+
+	/* Perform a manual lookup */
+	typentry = lookup_type_cache(typeoid, TYPECACHE_HASH_PROC |
+										  TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return InvalidOid;
+
+	return typentry->eq_opr;
+}
+
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if param_info and innerrel's lateral_vars can be hashed.
+ *		Returns true the hashing is possible, otherwise return false.
+ *
+ * Additionally we also collect the outer exprs and the hash operators for
+ * each parameter to innerrel.  These set in 'param_exprs' and 'operators'
+ * when we return true.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List **param_exprs, List **operators)
+
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Oid			hasheqop;
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			opexpr = (OpExpr *) rinfo->clause;
+
+			/* We only support OpExprs with 2 args */
+			if (!IsA(opexpr, OpExpr) || list_length(opexpr->args) != 2 ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			/* see if there's a valid hash equals operator for this type */
+			hasheqop = find_resultcache_hashop(rinfo, exprType(expr));
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(hasheqop))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			*operators = lappend_oid(*operators, hasheqop);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		Relids		var_relids = NULL;
+		TypeCacheEntry *typentry;
+
+		if (IsA(expr, Var))
+			var_relids = bms_make_singleton(((Var *) expr)->varno);
+		else if (IsA(expr, PlaceHolderVar))
+		{
+			PlaceHolderVar *phv = (PlaceHolderVar *) expr;
+
+			var_relids = pull_varnos(root, (Node *) phv->phexpr);
+		}
+		else
+			Assert(false);
+
+		/* No need for lateral vars that are from the innerrel itself */
+		/* XXX can this actually happen? */
+		if (bms_overlap(var_relids, innerrel->relids))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			bms_free(var_relids);
+			return false;
+		}
+		bms_free(var_relids);
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget))
+		return false;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo))
+			return false;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									outerrel,
+									innerrel,
+									&param_exprs,
+									&hash_operators))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1727,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1736,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1906,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1931,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..5d0e908d05 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -90,6 +90,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -276,6 +279,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -451,6 +459,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1524,6 +1537,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6442,6 +6505,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7028,6 +7113,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7073,6 +7159,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..9c166f621d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -751,6 +751,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..1ad44e6ead 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2754,6 +2754,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d5c66780ac..3f654e1155 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3876,6 +3926,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4094,6 +4155,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0c5dc4d3e8..032336d78b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1036,6 +1036,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b234a6bfe6..b3a80b8c6d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..ad04fd69ac 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,6 +265,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..df671d16f9
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..a71b0e5242 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -1999,6 +2000,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 299956f329..01761374dd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d2d3643bea..07066c3c44 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1488,6 +1488,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6e62104d0b..04c111d6dd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -773,6 +773,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1be93be098..67f925e793 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..fe8a2dbd39 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1ae0e5d939..ca06d41dd0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..5e6b02cdd7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2484,6 +2484,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2507,6 +2508,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3611,8 +3613,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3622,17 +3624,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3642,9 +3646,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4158,8 +4164,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4169,11 +4175,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4194,13 +4203,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4210,15 +4219,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4264,14 +4279,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4945,34 +4963,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5027,14 +5051,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..46887a4c7f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2137,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2174,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2211,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                         
+--------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2249,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..79a1114b5c
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,159 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Recheck Cond: (unique1 < 1000)
+               Heap Blocks: exact=333
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=1000 loops=1)
+                     Index Cond: (unique1 < 1000)
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: 0
+(13 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+                                     explain_resultcache                                     
+---------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=800 loops=1)
+         ->  Bitmap Heap Scan on tenk1 t2 (actual rows=800 loops=1)
+               Recheck Cond: (unique1 < 800)
+               Heap Blocks: exact=318
+               ->  Bitmap Index Scan on tenk1_unique1 (actual rows=800 loops=1)
+                     Index Cond: (unique1 < 800)
+         ->  Result Cache (actual rows=1 loops=800)
+               Cache Key: t2.thousand
+               Hits: Zero  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=800)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: 0
+(13 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Nested Loop
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1
+                           Recheck Cond: (unique1 < 1000)
+                           ->  Bitmap Index Scan on tenk1_unique1
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache
+                           Cache Key: t1.twenty
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2
+                                 Index Cond: (unique1 = t1.twenty)
+(13 rows)
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6d048e309c..a243b862d0 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -110,10 +110,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(18 rows)
+(19 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 312c11a4bd..2e89839089 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5a80bfacd8..a46f3d0178 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -203,6 +203,7 @@ test: partition_info
 test: tuplesort
 test: explain
 test: compression
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index eb53668299..eb80a2fe06 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6a209a27aa..26dd6704a2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -539,6 +539,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -548,6 +549,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..150820449c
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,85 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 800;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_hashjoin;
+
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
-- 
2.27.0

#101

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: David Rowley (#100)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Hi,
In paraminfo_get_equal_hashops(),

+       /* Reject if there are any volatile functions */
+       if (contain_volatile_functions(expr))
+       {

You can move the above code to just ahead of:

+       if (IsA(expr, Var))
+           var_relids = bms_make_singleton(((Var *) expr)->varno);

This way, when we return early, var_relids doesn't need to be populated.

Cheers

On Tue, Mar 30, 2021 at 4:42 AM David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Mon, 29 Mar 2021 at 15:56, Zhihong Yu <zyu@yugabyte.com> wrote:

For show_resultcache_info()

+ if (rcstate->shared_info != NULL)
+ {

The negated condition can be used with a return. This way, the loop can

be unindented.

OK. I change that.

+ * ResultCache nodes are intended to sit above a parameterized node in

the

+ * plan tree in order to cache results from them.

Since the parameterized node is singular, it would be nice if 'them' can

be expanded to refer to the source of result cache.

I've done a bit of rewording in that paragraph.

+ rcstate->mem_used -= freed_mem;

Should there be assertion that after the subtraction, mem_used stays

non-negative ?

I'm not sure. I ended up adding one and also adjusting the #ifdef in
remove_cache_entry() which had some code to validate the memory
accounting so that it compiles when USE_ASSERT_CHECKING is defined.
I'm unsure if that's a bit too expensive to enable during debugs but I
didn't really want to leave the code in there unless it's going to get
some exercise on the buildfarm.
+               if (found && entry->complete)
+               {
+                   node->stats.cache_hits += 1;    /* stats update */
Once inside the if block, we would return.
OK change.
+               else
+               {
The else block can be unindented (dropping else keyword).
changed.
+                * return 1 row.  XXX is this worth the check?
+                */
+               if (unlikely(entry->complete))
Since the check is on a flag (with minimal overhead), it seems the check
can be kept, with the question removed.

I changed the comment, but I did leave a mention that I'm still not
sure if it should be an Assert() or an elog.

The attached patch is an updated version of the Result Cache patch
containing the changes for the things you highlighted plus a few other
things.

I pushed the change to simplehash.h and the estimate_num_groups()
change earlier, so only 1 patch remaining.

Also, I noticed the CFBof found another unstable parallel regression
test. This was due to some code in show_resultcache_info() which
skipped parallel workers that appeared to not help out. It looks like
on my machine the worker never got a chance to do anything, but on one
of the CFbot's machines, it did. I ended up changing the EXPLAIN
output so that it shows the cache statistics regardless of if the
worker helped or not.

David

#102

[1]: /messages/by-id/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#101)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Wed, 31 Mar 2021 at 05:34, Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
In paraminfo_get_equal_hashops(),
+       /* Reject if there are any volatile functions */
+       if (contain_volatile_functions(expr))
+       {
You can move the above code to just ahead of:
+       if (IsA(expr, Var))
+           var_relids = bms_make_singleton(((Var *) expr)->varno);
This way, when we return early, var_relids doesn't need to be populated.

Thanks for having another look. I did a bit more work in that area
and removed that code. I dug a little deeper and I can't see any way
that a lateral_var on a rel can refer to anything inside the rel. It
looks like that code was just a bit over paranoid about that.

I also added some additional caching in RestrictInfo to cache the hash
equality operator to use for the result cache. This saves checking
this each time we consider a join during the join search. In many
cases we would have used the value cached in
RestrictInfo.hashjoinoperator, however, for non-equaliy joins, that
would have be set to InvalidOid. We can still use Result Cache for
non-equality joins.

I've now pushed the main patch.

There's a couple of things I'm not perfectly happy with:

1. The name. There's a discussion on [1]/messages/by-id/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com if anyone wants to talk about that.
2. For lateral joins, there's no place to cache the hash equality
operator. Maybe there's some rework to do to add the ability to check
things for those like we use RestrictInfo for regular joins.
3. No ability to cache n_distinct estimates. This must be repeated
each time we consider a join. RestrictInfo allows caching for this to
speed up clauselist_selectivity() for other join types.

There was no consensus reached on the name of the node. "Tuple Cache"
seems like the favourite so far, but there's not been a great deal of
input. At least not enough that I was motivated to rename everything.
People will perhaps have more time to consider names during beta.

Thank you to everyone who gave input and reviewed this patch. It would
be great to get feedback on the performance with real workloads. As
mentioned in the commit message, there is a danger that it causes
performance regressions when n_distinct estimates are significantly
underestimated.

I'm off to look at the buildfarm now.

David

#103

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-04-01%2000%3A28%3A12

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#102)

1 attachment(s)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 1 Apr 2021 at 12:49, David Rowley <dgrowleyml@gmail.com> wrote:

I'm off to look at the buildfarm now.

Well, it looks like the buildfarm didn't like the patch much. I had to
revert the patch.

It appears I overlooked some details in the EXPLAIN ANALYZE output
when force_parallel_mode = regress is on. To make this work I had to
change the EXPLAIN output so that it does not show the main process's
cache Hit/Miss/Eviction details when there are zero misses. In the
animals running force_parallel_mode = regress there was an additional
line for the parallel worker containing the expected cache
hits/misses/evictions as well as the one for the main process. The
main process was not doing any work. I took inspiration from
show_sort_info() which does not show the details for the main process
when it did not help with the Sort.

There was also an issue on florican [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-04-01%2000%3A28%3A12 which appears to be due to
that machine being 32-bit. I should have considered that when
thinking of the cache eviction test. I originally tried to make the
test as small as possible by lowering work_mem down to 64kB and only
using enough rows to overflow that by a small amount. I think what's
happening on florican is that due to all the pointer fields in the
cache being 32-bits instead of 64-bits that more records fit into the
cache and there are no evictions. I've scaled that test up a bit now
to use 1200 rows instead of 800.

The 32-bit machines also were reporting a different number of exact
blocks in the bitmap heap scan. I've now just disabled bitmap scans
for those tests.

I've attached the updated patch. I'll let the CFbot grab this to
ensure it's happy with it before I go looking to push it again.

David

Attachments:

v20-0001-Add-Result-Cache-executor-node.patchtext/plain; charset=US-ASCII; name=v20-0001-Add-Result-Cache-executor-node.patchDownload

From ed4ac5232bdb6c041a84485c246a8e2322e5909d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 2 Jul 2020 19:29:32 +1200
Subject: [PATCH v20] Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
---
 .../postgres_fdw/expected/postgres_fdw.out    |   25 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 doc/src/sgml/config.sgml                      |   24 +-
 src/backend/commands/explain.c                |  143 +++
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execAmi.c                |    5 +
 src/backend/executor/execExpr.c               |  134 ++
 src/backend/executor/execParallel.c           |   18 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/nodeResultCache.c        | 1137 +++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |   31 +
 src/backend/nodes/outfuncs.c                  |   37 +
 src/backend/nodes/readfuncs.c                 |   22 +
 src/backend/optimizer/path/allpaths.c         |    4 +
 src/backend/optimizer/path/costsize.c         |  148 +++
 src/backend/optimizer/path/joinpath.c         |  214 ++++
 src/backend/optimizer/plan/createplan.c       |   87 ++
 src/backend/optimizer/plan/initsplan.c        |   41 +
 src/backend/optimizer/plan/setrefs.c          |    9 +
 src/backend/optimizer/plan/subselect.c        |    5 +
 src/backend/optimizer/util/pathnode.c         |   71 +
 src/backend/optimizer/util/restrictinfo.c     |    3 +
 src/backend/utils/misc/guc.c                  |   10 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/include/executor/executor.h               |    7 +
 src/include/executor/nodeResultCache.h        |   31 +
 src/include/lib/ilist.h                       |   19 +
 src/include/nodes/execnodes.h                 |   66 +
 src/include/nodes/nodes.h                     |    3 +
 src/include/nodes/pathnodes.h                 |   22 +
 src/include/nodes/plannodes.h                 |   21 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    7 +
 src/test/regress/expected/aggregates.out      |    2 +
 src/test/regress/expected/join.out            |  131 +-
 src/test/regress/expected/partition_prune.out |  243 ++--
 src/test/regress/expected/resultcache.out     |  158 +++
 src/test/regress/expected/subselect.out       |   20 +-
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/serial_schedule              |    1 +
 src/test/regress/sql/aggregates.sql           |    2 +
 src/test/regress/sql/join.sql                 |    2 +
 src/test/regress/sql/partition_prune.sql      |    3 +
 src/test/regress/sql/resultcache.sql          |   91 ++
 45 files changed, 2831 insertions(+), 186 deletions(-)
 create mode 100644 src/backend/executor/nodeResultCache.c
 create mode 100644 src/include/executor/nodeResultCache.h
 create mode 100644 src/test/regress/expected/resultcache.out
 create mode 100644 src/test/regress/sql/resultcache.sql

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index eff7b04f11..2be14c5437 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1602,6 +1602,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL
  20 |  0 | AAA020
 (10 rows)
 
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -1628,6 +1629,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
  20 |  0 | AAA020
 (10 rows)
 
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
@@ -2139,22 +2141,25 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
 -- join with lateral reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
-                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1."C 1"
    ->  Nested Loop
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         ->  HashAggregate
-               Output: t2.c1, t3.c1
-               Group Key: t2.c1, t3.c1
-               ->  Foreign Scan
-                     Output: t2.c1, t3.c1
-                     Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                     Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+         ->  Result Cache
+               Cache Key: t1.c2
+               ->  Subquery Scan on q
+                     ->  HashAggregate
+                           Output: t2.c1, t3.c1
+                           Group Key: t2.c1, t3.c1
+                           ->  Foreign Scan
+                                 Output: t2.c1, t3.c1
+                                 Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
+(16 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 806a5bca28..21a29cc062 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -502,10 +502,12 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+SET enable_resultcache TO off;
 -- right outer join + left outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
+RESET enable_resultcache;
 -- left outer join + right outer join
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e2e8c4c3..9d87b5097a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1770,8 +1770,9 @@ include_dir 'conf.d'
         fact in mind when choosing the value.  Sort operations are used
         for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
         and merge joins.
-        Hash tables are used in hash joins, hash-based aggregation, and
-        hash-based processing of <literal>IN</literal> subqueries.
+        Hash tables are used in hash joins, hash-based aggregation, result
+        cache nodes and hash-based processing of <literal>IN</literal>
+        subqueries.
        </para>
        <para>
         Hash-based operations are generally more sensitive to memory
@@ -4925,6 +4926,25 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-resultcache" xreflabel="enable_resultcache">
+      <term><varname>enable_resultcache</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_resultcache</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of result cache plans for
+        caching results from parameterized scans inside nested-loop joins.
+        This plan type allows scans to the underlying plans to be skipped when
+        the results for the current parameters are already in the cache.  Less
+        commonly looked up results may be evicted from the cache when more
+        space is required for new entries. The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-mergejoin" xreflabel="enable_mergejoin">
       <term><varname>enable_mergejoin</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 872aaa7aed..d346e86c76 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,6 +108,8 @@ static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,
 									   ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
+								  ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1284,6 +1286,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
+		case T_ResultCache:
+			pname = sname = "Result Cache";
+			break;
 		case T_Sort:
 			pname = sname = "Sort";
 			break;
@@ -1996,6 +2001,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info(castNode(HashState, planstate), es);
 			break;
+		case T_ResultCache:
+			show_resultcache_info(castNode(ResultCacheState, planstate),
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -3063,6 +3072,140 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on result cache hits/misses/evictions and memory usage.
+ */
+static void
+show_resultcache_info(ResultCacheState *rcstate, List *ancestors, ExplainState *es)
+{
+	Plan	   *plan = ((PlanState *) rcstate)->plan;
+	ListCell   *lc;
+	List	   *context;
+	StringInfoData keystr;
+	char	   *seperator = "";
+	bool		useprefix;
+	int64		memPeakKb;
+
+	initStringInfo(&keystr);
+
+	/*
+	 * It's hard to imagine having a result cache with fewer than 2 RTEs, but
+	 * let's just keep the same useprefix logic as elsewhere in this file.
+	 */
+	useprefix = list_length(es->rtable) > 1 || es->verbose;
+
+	/* Set up deparsing context */
+	context = set_deparse_context_plan(es->deparse_cxt,
+									   plan,
+									   ancestors);
+
+	foreach(lc, ((ResultCache *) plan)->param_exprs)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+
+		appendStringInfoString(&keystr, seperator);
+
+		appendStringInfoString(&keystr, deparse_expression(expr, context,
+														   useprefix, false));
+		seperator = ", ";
+	}
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyText("Cache Key", keystr.data, es);
+	}
+	else
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
+	}
+
+	pfree(keystr.data);
+
+	if (!es->analyze)
+		return;
+
+	/*
+	 * mem_peak is only set when we freed memory, so we must use mem_used when
+	 * mem_peak is 0.
+	 */
+	if (rcstate->stats.mem_peak > 0)
+		memPeakKb = (rcstate->stats.mem_peak + 1023) / 1024;
+	else
+		memPeakKb = (rcstate->mem_used + 1023) / 1024;
+
+	if (rcstate->stats.cache_misses > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Cache Hits", NULL, rcstate->stats.cache_hits, es);
+			ExplainPropertyInteger("Cache Misses", NULL, rcstate->stats.cache_misses, es);
+			ExplainPropertyInteger("Cache Evictions", NULL, rcstate->stats.cache_evictions, es);
+			ExplainPropertyInteger("Cache Overflows", NULL, rcstate->stats.cache_overflows, es);
+			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+		}
+		else
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+							 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+							 rcstate->stats.cache_hits,
+							 rcstate->stats.cache_misses,
+							 rcstate->stats.cache_evictions,
+							 rcstate->stats.cache_overflows,
+							 memPeakKb);
+		}
+	}
+
+	if (rcstate->shared_info == NULL)
+		return;
+
+	/* Show details from parallel workers */
+	for (int n = 0; n < rcstate->shared_info->num_workers; n++)
+	{
+		ResultCacheInstrumentation *si;
+
+		si = &rcstate->shared_info->sinstrument[n];
+
+		if (es->workers_state)
+			ExplainOpenWorker(n, es);
+
+		/*
+		 * Since the worker's ResultCacheState.mem_used field is unavailable
+		 * to us, ExecEndResultCache will have set the
+		 * ResultCacheInstrumentation.mem_peak field for us.  No need to do
+		 * the zero checks like we did for the serial case above.
+		 */
+		memPeakKb = (si->mem_peak + 1023) / 1024;
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+							 "Hits: " UINT64_FORMAT "  Misses: " UINT64_FORMAT "  Evictions: " UINT64_FORMAT "  Overflows: " UINT64_FORMAT "  Memory Usage: " INT64_FORMAT "kB\n",
+							 si->cache_hits, si->cache_misses,
+							 si->cache_evictions, si->cache_overflows,
+							 memPeakKb);
+		}
+		else
+		{
+			ExplainPropertyInteger("Cache Hits", NULL,
+								   si->cache_hits, es);
+			ExplainPropertyInteger("Cache Misses", NULL,
+								   si->cache_misses, es);
+			ExplainPropertyInteger("Cache Evictions", NULL,
+								   si->cache_evictions, es);
+			ExplainPropertyInteger("Cache Overflows", NULL,
+								   si->cache_overflows, es);
+			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb,
+								   es);
+		}
+
+		if (es->workers_state)
+			ExplainCloseWorker(n, es);
+	}
+}
+
 /*
  * Show information on hash aggregate memory usage and batches.
  */
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 680fd69151..f08b282a5e 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	nodeProjectSet.o \
 	nodeRecursiveunion.o \
 	nodeResult.o \
+	nodeResultCache.o \
 	nodeSamplescan.o \
 	nodeSeqscan.o \
 	nodeSetOp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 58a8aa5ab7..b3726a54f3 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -44,6 +44,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -254,6 +255,10 @@ ExecReScan(PlanState *node)
 			ExecReScanMaterial((MaterialState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecReScanResultCache((ResultCacheState *) node);
+			break;
+
 		case T_SortState:
 			ExecReScanSort((SortState *) node);
 			break;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e33231f7be..23c0fb9379 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3696,3 +3696,137 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 
 	return state;
 }
+
+/*
+ * Build equality expression that can be evaluated using ExecQual(), returning
+ * true if the expression context's inner/outer tuples are equal.  Datums in
+ * the inner/outer slots are assumed to be in the same order and quantity as
+ * the 'eqfunctions' parameter.  NULLs are treated as equal.
+ *
+ * desc: tuple descriptor of the to-be-compared tuples
+ * lops: the slot ops for the inner tuple slots
+ * rops: the slot ops for the outer tuple slots
+ * eqFunctions: array of function oids of the equality functions to use
+ * this must be the same length as the 'param_exprs' list.
+ * collations: collation Oids to use for equality comparison. Must be the
+ * same length as the 'param_exprs' list.
+ * parent: parent executor node
+ */
+ExprState *
+ExecBuildParamSetEqual(TupleDesc desc,
+					   const TupleTableSlotOps *lops,
+					   const TupleTableSlotOps *rops,
+					   const Oid *eqfunctions,
+					   const Oid *collations,
+					   const List *param_exprs,
+					   PlanState *parent)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	int			maxatt = list_length(param_exprs);
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+
+	state->expr = NULL;
+	state->flags = EEO_FLAG_IS_QUAL;
+	state->parent = parent;
+
+	scratch.resvalue = &state->resvalue;
+	scratch.resnull = &state->resnull;
+
+	/* push deform steps */
+	scratch.opcode = EEOP_INNER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = lops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	scratch.opcode = EEOP_OUTER_FETCHSOME;
+	scratch.d.fetch.last_var = maxatt;
+	scratch.d.fetch.fixed = false;
+	scratch.d.fetch.known_desc = desc;
+	scratch.d.fetch.kind = rops;
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
+
+	for (int attno = 0; attno < maxatt; attno++)
+	{
+		Form_pg_attribute att = TupleDescAttr(desc, attno);
+		Oid			foid = eqfunctions[attno];
+		Oid			collid = collations[attno];
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		AclResult	aclresult;
+
+		/* Check permission to call function */
+		aclresult = pg_proc_aclcheck(foid, GetUserId(), ACL_EXECUTE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_FUNCTION, get_func_name(foid));
+
+		InvokeFunctionExecuteHook(foid);
+
+		/* Set up the primary fmgr lookup information */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(2));
+		fmgr_info(foid, finfo);
+		fmgr_info_set_expr(NULL, finfo);
+		InitFunctionCallInfoData(*fcinfo, finfo, 2,
+								 collid, NULL, NULL);
+
+		/* left arg */
+		scratch.opcode = EEOP_INNER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[0].value;
+		scratch.resnull = &fcinfo->args[0].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* right arg */
+		scratch.opcode = EEOP_OUTER_VAR;
+		scratch.d.var.attnum = attno;
+		scratch.d.var.vartype = att->atttypid;
+		scratch.resvalue = &fcinfo->args[1].value;
+		scratch.resnull = &fcinfo->args[1].isnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* evaluate distinctness */
+		scratch.opcode = EEOP_NOT_DISTINCT;
+		scratch.d.func.finfo = finfo;
+		scratch.d.func.fcinfo_data = fcinfo;
+		scratch.d.func.fn_addr = finfo->fn_addr;
+		scratch.d.func.nargs = 2;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+
+		/* then emit EEOP_QUAL to detect if result is false (or null) */
+		scratch.opcode = EEOP_QUAL;
+		scratch.d.qualexpr.jumpdone = -1;
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps,
+								   state->steps_len - 1);
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_QUAL);
+		Assert(as->d.qualexpr.jumpdone == -1);
+		as->d.qualexpr.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..366d0b20b9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -35,6 +35,7 @@
 #include "executor/nodeIncrementalSort.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeIndexscan.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSort.h"
 #include "executor/nodeSubplan.h"
@@ -292,6 +293,10 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggEstimate((AggState *) planstate, e->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheEstimate((ResultCacheState *) planstate, e->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -512,6 +517,10 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeDSM((AggState *) planstate, d->pcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeDSM((ResultCacheState *) planstate, d->pcxt);
+			break;
 		default:
 			break;
 	}
@@ -988,6 +997,7 @@ ExecParallelReInitializeDSM(PlanState *planstate,
 		case T_HashState:
 		case T_SortState:
 		case T_IncrementalSortState:
+		case T_ResultCacheState:
 			/* these nodes have DSM state, but no reinitialization is required */
 			break;
 
@@ -1057,6 +1067,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_AggState:
 			ExecAggRetrieveInstrumentation((AggState *) planstate);
 			break;
+		case T_ResultCacheState:
+			ExecResultCacheRetrieveInstrumentation((ResultCacheState *) planstate);
+			break;
 		default:
 			break;
 	}
@@ -1349,6 +1362,11 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			/* even when not parallel-aware, for EXPLAIN ANALYZE */
 			ExecAggInitializeWorker((AggState *) planstate, pwcxt);
 			break;
+		case T_ResultCacheState:
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecResultCacheInitializeWorker((ResultCacheState *) planstate,
+											pwcxt);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 29766d8196..9f8c7582e0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
 #include "executor/nodeProjectSet.h"
 #include "executor/nodeRecursiveunion.h"
 #include "executor/nodeResult.h"
+#include "executor/nodeResultCache.h"
 #include "executor/nodeSamplescan.h"
 #include "executor/nodeSeqscan.h"
 #include "executor/nodeSetOp.h"
@@ -325,6 +326,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														   estate, eflags);
 			break;
 
+		case T_ResultCache:
+			result = (PlanState *) ExecInitResultCache((ResultCache *) node,
+													   estate, eflags);
+			break;
+
 		case T_Group:
 			result = (PlanState *) ExecInitGroup((Group *) node,
 												 estate, eflags);
@@ -713,6 +719,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIncrementalSort((IncrementalSortState *) node);
 			break;
 
+		case T_ResultCacheState:
+			ExecEndResultCache((ResultCacheState *) node);
+			break;
+
 		case T_GroupState:
 			ExecEndGroup((GroupState *) node);
 			break;
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
new file mode 100644
index 0000000000..906b68c945
--- /dev/null
+++ b/src/backend/executor/nodeResultCache.c
@@ -0,0 +1,1137 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.c
+ *	  Routines to handle caching of results from parameterized nodes
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeResultCache.c
+ *
+ * ResultCache nodes are intended to sit above parameterized nodes in the plan
+ * tree in order to cache results from them.  The intention here is that a
+ * repeat scan with a parameter value that has already been seen by the node
+ * can fetch tuples from the cache rather than having to re-scan the outer
+ * node all over again.  The query planner may choose to make use of one of
+ * these when it thinks rescans for previously seen values are likely enough
+ * to warrant adding the additional node.
+ *
+ * The method of cache we use is a hash table.  When the cache fills, we never
+ * spill tuples to disk, instead, we choose to evict the least recently used
+ * cache entry from the cache.  We remember the least recently used entry by
+ * always pushing new entries and entries we look for onto the tail of a
+ * doubly linked list.  This means that older items always bubble to the top
+ * of this LRU list.
+ *
+ * Sometimes our callers won't run their scans to completion. For example a
+ * semi-join only needs to run until it finds a matching tuple, and once it
+ * does, the join operator skips to the next outer tuple and does not execute
+ * the inner side again on that scan.  Because of this, we must keep track of
+ * when a cache entry is complete, and by default, we know it is when we run
+ * out of tuples to read during the scan.  However, there are cases where we
+ * can mark the cache entry as complete without exhausting the scan of all
+ * tuples.  One case is unique joins, where the join operator knows that there
+ * will only be at most one match for any given outer tuple.  In order to
+ * support such cases we allow the "singlerow" option to be set for the cache.
+ * This option marks the cache entry as complete after we read the first tuple
+ * from the subnode.
+ *
+ * It's possible when we're filling the cache for a given set of parameters
+ * that we're unable to free enough memory to store any more tuples.  If this
+ * happens then we'll have already evicted all other cache entries.  When
+ * caching another tuple would cause us to exceed our memory budget, we must
+ * free the entry that we're currently populating and move the state machine
+ * into RC_CACHE_BYPASS_MODE.  This means that we'll not attempt to cache any
+ * further tuples for this particular scan.  We don't have the memory for it.
+ * The state machine will be reset again on the next rescan.  If the memory
+ * requirements to cache the next parameter's tuples are less demanding, then
+ * that may allow us to start putting useful entries back into the cache
+ * again.
+ *
+ *
+ * INTERFACE ROUTINES
+ *		ExecResultCache			- lookup cache, exec subplan when not found
+ *		ExecInitResultCache		- initialize node and subnodes
+ *		ExecEndResultCache		- shutdown node and subnodes
+ *		ExecReScanResultCache	- rescan the result cache
+ *
+ *		ExecResultCacheEstimate		estimates DSM space needed for parallel plan
+ *		ExecResultCacheInitializeDSM initialize DSM for parallel plan
+ *		ExecResultCacheInitializeWorker attach to DSM info in parallel worker
+ *		ExecResultCacheRetrieveInstrumentation get instrumentation from worker
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "common/hashfn.h"
+#include "executor/executor.h"
+#include "executor/nodeResultCache.h"
+#include "lib/ilist.h"
+#include "miscadmin.h"
+#include "utils/lsyscache.h"
+
+/* States of the ExecResultCache state machine */
+#define RC_CACHE_LOOKUP				1	/* Attempt to perform a cache lookup */
+#define RC_CACHE_FETCH_NEXT_TUPLE	2	/* Get another tuple from the cache */
+#define RC_FILLING_CACHE			3	/* Read outer node to fill cache */
+#define RC_CACHE_BYPASS_MODE		4	/* Bypass mode.  Just read from our
+										 * subplan without caching anything */
+#define RC_END_OF_SCAN				5	/* Ready for rescan */
+
+
+/* Helper macros for memory accounting */
+#define EMPTY_ENTRY_MEMORY_BYTES(e)		(sizeof(ResultCacheEntry) + \
+										 sizeof(ResultCacheKey) + \
+										 (e)->key->params->t_len);
+#define CACHE_TUPLE_BYTES(t)			(sizeof(ResultCacheTuple) + \
+										 (t)->mintuple->t_len)
+
+ /* ResultCacheTuple Stores an individually cached tuple */
+typedef struct ResultCacheTuple
+{
+	MinimalTuple mintuple;		/* Cached tuple */
+	struct ResultCacheTuple *next;	/* The next tuple with the same parameter
+									 * values or NULL if it's the last one */
+} ResultCacheTuple;
+
+/*
+ * ResultCacheKey
+ * The hash table key for cached entries plus the LRU list link
+ */
+typedef struct ResultCacheKey
+{
+	MinimalTuple params;
+	dlist_node	lru_node;		/* Pointer to next/prev key in LRU list */
+} ResultCacheKey;
+
+/*
+ * ResultCacheEntry
+ *		The data struct that the cache hash table stores
+ */
+typedef struct ResultCacheEntry
+{
+	ResultCacheKey *key;		/* Hash key for hash table lookups */
+	ResultCacheTuple *tuplehead;	/* Pointer to the first tuple or NULL if
+									 * no tuples are cached for this entry */
+	uint32		hash;			/* Hash value (cached) */
+	char		status;			/* Hash status */
+	bool		complete;		/* Did we read the outer plan to completion? */
+} ResultCacheEntry;
+
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32 ResultCacheHash_hash(struct resultcache_hash *tb,
+								   const ResultCacheKey *key);
+static int	ResultCacheHash_equal(struct resultcache_hash *tb,
+								  const ResultCacheKey *params1,
+								  const ResultCacheKey *params2);
+
+#define SH_PREFIX resultcache
+#define SH_ELEMENT_TYPE ResultCacheEntry
+#define SH_KEY_TYPE ResultCacheKey *
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) ResultCacheHash_hash(tb, key)
+#define SH_EQUAL(tb, a, b) (ResultCacheHash_equal(tb, a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * ResultCacheHash_hash
+ *		Hash function for simplehash hashtable.  'key' is unused here as we
+ *		require that all table lookups first populate the ResultCacheState's
+ *		probeslot with the key values to be looked up.
+ */
+static uint32
+ResultCacheHash_hash(struct resultcache_hash *tb, const ResultCacheKey *key)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	TupleTableSlot *pslot = rcstate->probeslot;
+	uint32		hashkey = 0;
+	int			numkeys = rcstate->nkeys;
+	FmgrInfo   *hashfunctions = rcstate->hashfunctions;
+	Oid		   *collations = rcstate->collations;
+
+	for (int i = 0; i < numkeys; i++)
+	{
+		/* rotate hashkey left 1 bit at each step */
+		hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);
+
+		if (!pslot->tts_isnull[i])	/* treat nulls as having hash key 0 */
+		{
+			uint32		hkey;
+
+			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i],
+													collations[i], pslot->tts_values[i]));
+			hashkey ^= hkey;
+		}
+	}
+
+	return murmurhash32(hashkey);
+}
+
+/*
+ * ResultCacheHash_equal
+ *		Equality function for confirming hash value matches during a hash
+ *		table lookup.  'key2' is never used.  Instead the ResultCacheState's
+ *		probeslot is always populated with details of what's being looked up.
+ */
+static int
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+					  const ResultCacheKey *key2)
+{
+	ResultCacheState *rcstate = (ResultCacheState *) tb->private_data;
+	ExprContext *econtext = rcstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	TupleTableSlot *pslot = rcstate->probeslot;
+
+	/* probeslot should have already been prepared by prepare_probe_slot() */
+
+	ExecStoreMinimalTuple(key1->params, tslot, false);
+
+	econtext->ecxt_innertuple = tslot;
+	econtext->ecxt_outertuple = pslot;
+	return !ExecQualAndReset(rcstate->cache_eq_expr, econtext);
+}
+
+/*
+ * Initialize the hash table to empty.
+ */
+static void
+build_hash_table(ResultCacheState *rcstate, uint32 size)
+{
+	/* Make a guess at a good size when we're not given a valid size. */
+	if (size == 0)
+		size = 1024;
+
+	/* resultcache_create will convert the size to a power of 2 */
+	rcstate->hashtable = resultcache_create(rcstate->tableContext, size,
+											rcstate);
+}
+
+/*
+ * prepare_probe_slot
+ *		Populate rcstate's probeslot with the values from the tuple stored
+ *		in 'key'.  If 'key' is NULL, then perform the population by evaluating
+ *		rcstate's param_exprs.
+ */
+static inline void
+prepare_probe_slot(ResultCacheState *rcstate, ResultCacheKey *key)
+{
+	TupleTableSlot *pslot = rcstate->probeslot;
+	TupleTableSlot *tslot = rcstate->tableslot;
+	int			numKeys = rcstate->nkeys;
+
+	ExecClearTuple(pslot);
+
+	if (key == NULL)
+	{
+		/* Set the probeslot's values based on the current parameter values */
+		for (int i = 0; i < numKeys; i++)
+			pslot->tts_values[i] = ExecEvalExpr(rcstate->param_exprs[i],
+												rcstate->ss.ps.ps_ExprContext,
+												&pslot->tts_isnull[i]);
+	}
+	else
+	{
+		/* Process the key's MinimalTuple and store the values in probeslot */
+		ExecStoreMinimalTuple(key->params, tslot, false);
+		slot_getallattrs(tslot);
+		memcpy(pslot->tts_values, tslot->tts_values, sizeof(Datum) * numKeys);
+		memcpy(pslot->tts_isnull, tslot->tts_isnull, sizeof(bool) * numKeys);
+	}
+
+	ExecStoreVirtualTuple(pslot);
+}
+
+/*
+ * entry_purge_tuples
+ *		Remove all tuples from the cache entry pointed to by 'entry'.  This
+ *		leaves an empty cache entry.  Also, update the memory accounting to
+ *		reflect the removal of the tuples.
+ */
+static inline void
+entry_purge_tuples(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheTuple *tuple = entry->tuplehead;
+	uint64		freed_mem = 0;
+
+	while (tuple != NULL)
+	{
+		ResultCacheTuple *next = tuple->next;
+
+		freed_mem += CACHE_TUPLE_BYTES(tuple);
+
+		/* Free memory used for this tuple */
+		pfree(tuple->mintuple);
+		pfree(tuple);
+
+		tuple = next;
+	}
+
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/* Update the memory accounting */
+	rcstate->mem_used -= freed_mem;
+
+	Assert(rcstate->mem_used >= 0);
+}
+
+/*
+ * remove_cache_entry
+ *		Remove 'entry' from the cache and free memory used by it.
+ */
+static void
+remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
+{
+	ResultCacheKey *key = entry->key;
+
+	dlist_delete(&entry->key->lru_node);
+
+#ifdef USE_ASSERT_CHECKING
+	/*
+	 * Validate the memory accounting code is correct in assert builds. XXX is
+	 * this too expensive for USE_ASSERT_CHECKING?
+	 */
+	{
+		int			i,
+					count;
+		uint64		mem = 0;
+
+		count = 0;
+		for (i = 0; i < rcstate->hashtable->size; i++)
+		{
+			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
+
+			if (entry->status == resultcache_SH_IN_USE)
+			{
+				ResultCacheTuple *tuple = entry->tuplehead;
+
+				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+				while (tuple != NULL)
+				{
+					mem += CACHE_TUPLE_BYTES(tuple);
+					tuple = tuple->next;
+				}
+				count++;
+			}
+		}
+
+		Assert(count == rcstate->hashtable->members);
+		Assert(mem == rcstate->mem_used);
+	}
+#endif
+
+	/* Remove all of the tuples from this entry */
+	entry_purge_tuples(rcstate, entry);
+
+	/*
+	 * Update memory accounting. entry_purge_tuples should have already
+	 * subtracted the memory used for each cached tuple.  Here we just update
+	 * the amount used by the entry itself.
+	 */
+	rcstate->mem_used -= EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	Assert(rcstate->mem_used >= 0);
+
+	/* Remove the entry from the cache */
+	resultcache_delete_item(rcstate->hashtable, entry);
+
+	pfree(key->params);
+	pfree(key);
+}
+
+/*
+ * cache_reduce_memory
+ *		Evict older and less recently used items from the cache in order to
+ *		reduce the memory consumption back to something below the
+ *		ResultCacheState's mem_limit.
+ *
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * which the key belongs to is removed from the cache.
+ */
+static bool
+cache_reduce_memory(ResultCacheState *rcstate, ResultCacheKey *specialkey)
+{
+	bool		specialkey_intact = true;	/* for now */
+	dlist_mutable_iter iter;
+	uint64		evictions = 0;
+
+	/* Update peak memory usage */
+	if (rcstate->mem_used > rcstate->stats.mem_peak)
+		rcstate->stats.mem_peak = rcstate->mem_used;
+
+	/* We expect only to be called when we've gone over budget on memory */
+	Assert(rcstate->mem_used > rcstate->mem_limit);
+
+	/* Start the eviction process starting at the head of the LRU list. */
+	dlist_foreach_modify(iter, &rcstate->lru_list)
+	{
+		ResultCacheKey *key = dlist_container(ResultCacheKey, lru_node,
+											  iter.cur);
+		ResultCacheEntry *entry;
+
+		/*
+		 * Populate the hash probe slot in preparation for looking up this LRU
+		 * entry.
+		 */
+		prepare_probe_slot(rcstate, key);
+
+		/*
+		 * Ideally the LRU list pointers would be stored in the entry itself
+		 * rather than in the key.  Unfortunately, we can't do that as the
+		 * simplehash.h code may resize the table and allocate new memory for
+		 * entries which would result in those pointers pointing to the old
+		 * buckets.  However, it's fine to use the key to store this as that's
+		 * only referenced by a pointer in the entry, which of course follows
+		 * the entry whenever the hash table is resized.  Since we only have a
+		 * pointer to the key here, we must perform a hash table lookup to
+		 * find the entry that the key belongs to.
+		 */
+		entry = resultcache_lookup(rcstate->hashtable, NULL);
+
+		/* A good spot to check for corruption of the table and LRU list. */
+		Assert(entry != NULL);
+		Assert(entry->key == key);
+
+		/*
+		 * If we're being called to free memory while the cache is being
+		 * populated with new tuples, then we'd better take some care as we
+		 * could end up freeing the entry which 'specialkey' belongs to.
+		 * Generally callers will pass 'specialkey' as the key for the cache
+		 * entry which is currently being populated, so we must set
+		 * 'specialkey_intact' to false to inform the caller the specialkey
+		 * entry has been removed.
+		 */
+		if (key == specialkey)
+			specialkey_intact = false;
+
+		/*
+		 * Finally remove the entry.  This will remove from the LRU list too.
+		 */
+		remove_cache_entry(rcstate, entry);
+
+		evictions++;
+
+		/* Exit if we've freed enough memory */
+		if (rcstate->mem_used <= rcstate->mem_limit)
+			break;
+	}
+
+	rcstate->stats.cache_evictions += evictions;	/* Update Stats */
+
+	return specialkey_intact;
+}
+
+/*
+ * cache_lookup
+ *		Perform a lookup to see if we've already cached results based on the
+ *		scan's current parameters.  If we find an existing entry we move it to
+ *		the end of the LRU list, set *found to true then return it.  If we
+ *		don't find an entry then we create a new one and add it to the end of
+ *		the LRU list.  We also update cache memory accounting and remove older
+ *		entries if we go over the memory budget.  If we managed to free enough
+ *		memory we return the new entry, else we return NULL.
+ *
+ * Callers can assume we'll never return NULL when *found is true.
+ */
+static ResultCacheEntry *
+cache_lookup(ResultCacheState *rcstate, bool *found)
+{
+	ResultCacheKey *key;
+	ResultCacheEntry *entry;
+	MemoryContext oldcontext;
+
+	/* prepare the probe slot with the current scan parameters */
+	prepare_probe_slot(rcstate, NULL);
+
+	/*
+	 * Add the new entry to the cache.  No need to pass a valid key since the
+	 * hash function uses rcstate's probeslot, which we populated above.
+	 */
+	entry = resultcache_insert(rcstate->hashtable, NULL, found);
+
+	if (*found)
+	{
+		/*
+		 * Move existing entry to the tail of the LRU list to mark it as the
+		 * most recently used item.
+		 */
+		dlist_move_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+		return entry;
+	}
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	/* Allocate a new key */
+	entry->key = key = (ResultCacheKey *) palloc(sizeof(ResultCacheKey));
+	key->params = ExecCopySlotMinimalTuple(rcstate->probeslot);
+
+	/* Update the total cache memory utilization */
+	rcstate->mem_used += EMPTY_ENTRY_MEMORY_BYTES(entry);
+
+	/* Initialize this entry */
+	entry->complete = false;
+	entry->tuplehead = NULL;
+
+	/*
+	 * Since this is the most recently used entry, push this entry onto the
+	 * end of the LRU list.
+	 */
+	dlist_push_tail(&rcstate->lru_list, &entry->key->lru_node);
+
+	rcstate->last_tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget, then we'll free up some space in
+	 * the cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		/*
+		 * Try to free up some memory.  It's highly unlikely that we'll fail
+		 * to do so here since the entry we've just added is yet to contain
+		 * any tuples and we're able to remove any other entry to reduce the
+		 * memory consumption.
+		 */
+		if (unlikely(!cache_reduce_memory(rcstate, key)))
+			return NULL;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the newly added entry */
+			entry = resultcache_lookup(rcstate->hashtable, NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return entry;
+}
+
+/*
+ * cache_store_tuple
+ *		Add the tuple stored in 'slot' to the rcstate's current cache entry.
+ *		The cache entry must have already been made with cache_lookup().
+ *		rcstate's last_tuple field must point to the tail of rcstate->entry's
+ *		list of tuples.
+ */
+static bool
+cache_store_tuple(ResultCacheState *rcstate, TupleTableSlot *slot)
+{
+	ResultCacheTuple *tuple;
+	ResultCacheEntry *entry = rcstate->entry;
+	MemoryContext oldcontext;
+
+	Assert(slot != NULL);
+	Assert(entry != NULL);
+
+	oldcontext = MemoryContextSwitchTo(rcstate->tableContext);
+
+	tuple = (ResultCacheTuple *) palloc(sizeof(ResultCacheTuple));
+	tuple->mintuple = ExecCopySlotMinimalTuple(slot);
+	tuple->next = NULL;
+
+	/* Account for the memory we just consumed */
+	rcstate->mem_used += CACHE_TUPLE_BYTES(tuple);
+
+	if (entry->tuplehead == NULL)
+	{
+		/*
+		 * This is the first tuple for this entry, so just point the list head
+		 * to it.
+		 */
+		entry->tuplehead = tuple;
+	}
+	else
+	{
+		/* push this tuple onto the tail of the list */
+		rcstate->last_tuple->next = tuple;
+	}
+
+	rcstate->last_tuple = tuple;
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * If we've gone over our memory budget then free up some space in the
+	 * cache.
+	 */
+	if (rcstate->mem_used > rcstate->mem_limit)
+	{
+		ResultCacheKey *key = entry->key;
+
+		if (!cache_reduce_memory(rcstate, key))
+			return false;
+
+		/*
+		 * The process of removing entries from the cache may have caused the
+		 * code in simplehash.h to shuffle elements to earlier buckets in the
+		 * hash table.  If it has, we'll need to find the entry again by
+		 * performing a lookup.  Fortunately, we can detect if this has
+		 * happened by seeing if the entry is still in use and that the key
+		 * pointer matches our expected key.
+		 */
+		if (entry->status != resultcache_SH_IN_USE || entry->key != key)
+		{
+			/*
+			 * We need to repopulate the probeslot as lookups performed during
+			 * the cache evictions above will have stored some other key.
+			 */
+			prepare_probe_slot(rcstate, key);
+
+			/* Re-find the entry */
+			rcstate->entry = entry = resultcache_lookup(rcstate->hashtable,
+														NULL);
+			Assert(entry != NULL);
+		}
+	}
+
+	return true;
+}
+
+static TupleTableSlot *
+ExecResultCache(PlanState *pstate)
+{
+	ResultCacheState *node = castNode(ResultCacheState, pstate);
+	PlanState  *outerNode;
+	TupleTableSlot *slot;
+
+	switch (node->rc_status)
+	{
+		case RC_CACHE_LOOKUP:
+			{
+				ResultCacheEntry *entry;
+				TupleTableSlot *outerslot;
+				bool		found;
+
+				Assert(node->entry == NULL);
+
+				/*
+				 * We're only ever in this state for the first call of the
+				 * scan.  Here we have a look to see if we've already seen the
+				 * current parameters before and if we have already cached a
+				 * complete set of records that the outer plan will return for
+				 * these parameters.
+				 *
+				 * When we find a valid cache entry, we'll return the first
+				 * tuple from it. If not found, we'll create a cache entry and
+				 * then try to fetch a tuple from the outer scan.  If we find
+				 * one there, we'll try to cache it.
+				 */
+
+				/* see if we've got anything cached for the current parameters */
+				entry = cache_lookup(node, &found);
+
+				if (found && entry->complete)
+				{
+					node->stats.cache_hits += 1;	/* stats update */
+
+					/*
+					 * Set last_tuple and entry so that the state
+					 * RC_CACHE_FETCH_NEXT_TUPLE can easily find the next
+					 * tuple for these parameters.
+					 */
+					node->last_tuple = entry->tuplehead;
+					node->entry = entry;
+
+					/* Fetch the first cached tuple, if there is one */
+					if (entry->tuplehead)
+					{
+						node->rc_status = RC_CACHE_FETCH_NEXT_TUPLE;
+
+						slot = node->ss.ps.ps_ResultTupleSlot;
+						ExecStoreMinimalTuple(entry->tuplehead->mintuple,
+											  slot, false);
+
+						return slot;
+					}
+
+					/* The cache entry is void of any tuples. */
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/* Handle cache miss */
+				node->stats.cache_misses += 1;	/* stats update */
+
+				if (found)
+				{
+					/*
+					 * A cache entry was found, but the scan for that entry
+					 * did not run to completion.  We'll just remove all
+					 * tuples and start again.  It might be tempting to
+					 * continue where we left off, but there's no guarantee
+					 * the outer node will produce the tuples in the same
+					 * order as it did last time.
+					 */
+					entry_purge_tuples(node, entry);
+				}
+
+				/* Scan the outer node for a tuple to cache */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/*
+					 * cache_lookup may have returned NULL due to failure to
+					 * free enough cache space, so ensure we don't do anything
+					 * here that assumes it worked. There's no need to go into
+					 * bypass mode here as we're setting rc_status to end of
+					 * scan.
+					 */
+					if (likely(entry))
+						entry->complete = true;
+
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				node->entry = entry;
+
+				/*
+				 * If we failed to create the entry or failed to store the
+				 * tuple in the entry, then go into bypass mode.
+				 */
+				if (unlikely(entry == NULL ||
+					!cache_store_tuple(node, outerslot)))
+				{
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out last_tuple as we'll stay in bypass
+					 * mode until the end of the scan.
+					 */
+				}
+				else
+				{
+					/*
+					 * If we only expect a single row from this scan then we
+					 * can mark that we're not expecting more.  This allows
+					 * cache lookups to work even when the scan has not been
+					 * executed to completion.
+					 */
+					entry->complete = node->singlerow;
+					node->rc_status = RC_FILLING_CACHE;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_FETCH_NEXT_TUPLE:
+			{
+				/* We shouldn't be in this state if these are not set */
+				Assert(node->entry != NULL);
+				Assert(node->last_tuple != NULL);
+
+				/* Skip to the next tuple to output */
+				node->last_tuple = node->last_tuple->next;
+
+				/* No more tuples in the cache */
+				if (node->last_tuple == NULL)
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecStoreMinimalTuple(node->last_tuple->mintuple, slot,
+									  false);
+
+				return slot;
+			}
+
+		case RC_FILLING_CACHE:
+			{
+				TupleTableSlot *outerslot;
+				ResultCacheEntry *entry = node->entry;
+
+				/* entry should already have been set by RC_CACHE_LOOKUP */
+				Assert(entry != NULL);
+
+				/*
+				 * When in the RC_FILLING_CACHE state, we've just had a cache
+				 * miss and are populating the cache with the current scan
+				 * tuples.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					/* No more tuples.  Mark it as complete */
+					entry->complete = true;
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				/*
+				 * Validate if the planner properly set the singlerow flag.
+				 * It should only set that if each cache entry can, at most,
+				 * return 1 row.  XXX maybe this should be an Assert?
+				 */
+				if (unlikely(entry->complete))
+					elog(ERROR, "cache entry already complete");
+
+				/* Record the tuple in the current cache entry */
+				if (unlikely(!cache_store_tuple(node, outerslot)))
+				{
+					/* Couldn't store it?  Handle overflow */
+					node->stats.cache_overflows += 1;	/* stats update */
+
+					node->rc_status = RC_CACHE_BYPASS_MODE;
+
+					/*
+					 * No need to clear out entry or last_tuple as we'll stay
+					 * in bypass mode until the end of the scan.
+					 */
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_CACHE_BYPASS_MODE:
+			{
+				TupleTableSlot *outerslot;
+
+				/*
+				 * When in bypass mode we just continue to read tuples without
+				 * caching.  We need to wait until the next rescan before we
+				 * can come out of this mode.
+				 */
+				outerNode = outerPlanState(node);
+				outerslot = ExecProcNode(outerNode);
+				if (TupIsNull(outerslot))
+				{
+					node->rc_status = RC_END_OF_SCAN;
+					return NULL;
+				}
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+				ExecCopySlot(slot, outerslot);
+				return slot;
+			}
+
+		case RC_END_OF_SCAN:
+
+			/*
+			 * We've already returned NULL for this scan, but just in case
+			 * something calls us again by mistake.
+			 */
+			return NULL;
+
+		default:
+			elog(ERROR, "unrecognized resultcache state: %d",
+				 (int) node->rc_status);
+			return NULL;
+	}							/* switch */
+}
+
+ResultCacheState *
+ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
+{
+	ResultCacheState *rcstate = makeNode(ResultCacheState);
+	Plan	   *outerNode;
+	int			i;
+	int			nkeys;
+	Oid		   *eqfuncoids;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	rcstate->ss.ps.plan = (Plan *) node;
+	rcstate->ss.ps.state = estate;
+	rcstate->ss.ps.ExecProcNode = ExecResultCache;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &rcstate->ss.ps);
+
+	outerNode = outerPlan(node);
+	outerPlanState(rcstate) = ExecInitNode(outerNode, estate, eflags);
+
+	/*
+	 * Initialize return slot and type. No need to initialize projection info
+	 * because this node doesn't do projections.
+	 */
+	ExecInitResultTupleSlotTL(&rcstate->ss.ps, &TTSOpsMinimalTuple);
+	rcstate->ss.ps.ps_ProjInfo = NULL;
+
+	/*
+	 * Initialize scan slot and type.
+	 */
+	ExecCreateScanSlotFromOuterPlan(estate, &rcstate->ss, &TTSOpsMinimalTuple);
+
+	/*
+	 * Set the state machine to lookup the cache.  We won't find anything
+	 * until we cache something, but this saves a special case to create the
+	 * first entry.
+	 */
+	rcstate->rc_status = RC_CACHE_LOOKUP;
+
+	rcstate->nkeys = nkeys = node->numKeys;
+	rcstate->hashkeydesc = ExecTypeFromExprList(node->param_exprs);
+	rcstate->tableslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsMinimalTuple);
+	rcstate->probeslot = MakeSingleTupleTableSlot(rcstate->hashkeydesc,
+												  &TTSOpsVirtual);
+
+	rcstate->param_exprs = (ExprState **) palloc(nkeys * sizeof(ExprState *));
+	rcstate->collations = node->collations; /* Just point directly to the plan
+											 * data */
+	rcstate->hashfunctions = (FmgrInfo *) palloc(nkeys * sizeof(FmgrInfo));
+
+	eqfuncoids = palloc(nkeys * sizeof(Oid));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		Oid			hashop = node->hashOperators[i];
+		Oid			left_hashfn;
+		Oid			right_hashfn;
+		Expr	   *param_expr = (Expr *) list_nth(node->param_exprs, i);
+
+		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
+			elog(ERROR, "could not find hash function for hash operator %u",
+				 hashop);
+
+		fmgr_info(left_hashfn, &rcstate->hashfunctions[i]);
+
+		rcstate->param_exprs[i] = ExecInitExpr(param_expr, (PlanState *) rcstate);
+		eqfuncoids[i] = get_opcode(hashop);
+	}
+
+	rcstate->cache_eq_expr = ExecBuildParamSetEqual(rcstate->hashkeydesc,
+													&TTSOpsMinimalTuple,
+													&TTSOpsVirtual,
+													eqfuncoids,
+													node->collations,
+													node->param_exprs,
+													(PlanState *) rcstate);
+
+	pfree(eqfuncoids);
+	rcstate->mem_used = 0;
+
+	/* Limit the total memory consumed by the cache to this */
+	rcstate->mem_limit = get_hash_mem() * 1024L;
+
+	/* A memory context dedicated for the cache */
+	rcstate->tableContext = AllocSetContextCreate(CurrentMemoryContext,
+												  "ResultCacheHashTable",
+												  ALLOCSET_DEFAULT_SIZES);
+
+	dlist_init(&rcstate->lru_list);
+	rcstate->last_tuple = NULL;
+	rcstate->entry = NULL;
+
+	/*
+	 * Mark if we can assume the cache entry is completed after we get the
+	 * first record for it.  Some callers might not call us again after
+	 * getting the first match. e.g. A join operator performing a unique join
+	 * is able to skip to the next outer tuple after getting the first
+	 * matching inner tuple.  In this case, the cache entry is complete after
+	 * getting the first tuple.  This allows us to mark it as so.
+	 */
+	rcstate->singlerow = node->singlerow;
+
+	/* Zero the statistics counters */
+	memset(&rcstate->stats, 0, sizeof(ResultCacheInstrumentation));
+
+	/* Allocate and set up the actual cache */
+	build_hash_table(rcstate, node->est_entries);
+
+	return rcstate;
+}
+
+void
+ExecEndResultCache(ResultCacheState *node)
+{
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->shared_info != NULL && IsParallelWorker())
+	{
+		ResultCacheInstrumentation *si;
+
+		/* Make mem_peak available for EXPLAIN */
+		if (node->stats.mem_peak == 0)
+			node->stats.mem_peak = node->mem_used;
+
+		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
+		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
+		memcpy(si, &node->stats, sizeof(ResultCacheInstrumentation));
+	}
+
+	/* Remove the cache context */
+	MemoryContextDelete(node->tableContext);
+
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	/* must drop pointer to cache result tuple */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+
+	/*
+	 * free exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * shut down the subplan
+	 */
+	ExecEndNode(outerPlanState(node));
+}
+
+void
+ExecReScanResultCache(ResultCacheState *node)
+{
+	PlanState  *outerPlan = outerPlanState(node);
+
+	/* Mark that we must lookup the cache for a new set of parameters */
+	node->rc_status = RC_CACHE_LOOKUP;
+
+	/* nullify pointers used for the last scan */
+	node->entry = NULL;
+	node->last_tuple = NULL;
+
+	/*
+	 * if chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
+}
+
+/*
+ * ExecEstimateCacheEntryOverheadBytes
+ *		For use in the query planner to help it estimate the amount of memory
+ *		required to store a single entry in the cache.
+ */
+double
+ExecEstimateCacheEntryOverheadBytes(double ntuples)
+{
+	return sizeof(ResultCacheEntry) + sizeof(ResultCacheKey) +
+		sizeof(ResultCacheTuple) * ntuples;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Query Support
+ * ----------------------------------------------------------------
+ */
+
+ /* ----------------------------------------------------------------
+  *		ExecResultCacheEstimate
+  *
+  *		Estimate space required to propagate result cache statistics.
+  * ----------------------------------------------------------------
+  */
+void
+ExecResultCacheEstimate(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = mul_size(pcxt->nworkers, sizeof(ResultCacheInstrumentation));
+	size = add_size(size, offsetof(SharedResultCacheInfo, sinstrument));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeDSM
+ *
+ *		Initialize DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeDSM(ResultCacheState *node, ParallelContext *pcxt)
+{
+	Size		size;
+
+	/* don't need this if not instrumenting or no workers */
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ pcxt->nworkers * sizeof(ResultCacheInstrumentation);
+	node->shared_info = shm_toc_allocate(pcxt->toc, size);
+	/* ensure any unfilled slots will contain zeroes */
+	memset(node->shared_info, 0, size);
+	node->shared_info->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id,
+				   node->shared_info);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheInitializeWorker
+ *
+ *		Attach worker to DSM space for result cache statistics.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheInitializeWorker(ResultCacheState *node, ParallelWorkerContext *pwcxt)
+{
+	node->shared_info =
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, true);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecResultCacheRetrieveInstrumentation
+ *
+ *		Transfer result cache statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecResultCacheRetrieveInstrumentation(ResultCacheState *node)
+{
+	Size		size;
+	SharedResultCacheInfo *si;
+
+	if (node->shared_info == NULL)
+		return;
+
+	size = offsetof(SharedResultCacheInfo, sinstrument)
+		+ node->shared_info->num_workers * sizeof(ResultCacheInstrumentation);
+	si = palloc(size);
+	memcpy(si, node->shared_info, size);
+	node->shared_info = si;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 44c7fce20a..ad729d10a8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -948,6 +948,33 @@ _copyMaterial(const Material *from)
 }
 
 
+/*
+ * _copyResultCache
+ */
+static ResultCache *
+_copyResultCache(const ResultCache *from)
+{
+	ResultCache *newnode = makeNode(ResultCache);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_SCALAR_FIELD(numKeys);
+	COPY_POINTER_FIELD(hashOperators, sizeof(Oid) * from->numKeys);
+	COPY_POINTER_FIELD(collations, sizeof(Oid) * from->numKeys);
+	COPY_NODE_FIELD(param_exprs);
+	COPY_SCALAR_FIELD(singlerow);
+	COPY_SCALAR_FIELD(est_entries);
+
+	return newnode;
+}
+
+
 /*
  * CopySortFields
  *
@@ -2340,6 +2367,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_SCALAR_FIELD(right_bucketsize);
 	COPY_SCALAR_FIELD(left_mcvfreq);
 	COPY_SCALAR_FIELD(right_mcvfreq);
+	COPY_SCALAR_FIELD(hasheqoperator);
 
 	return newnode;
 }
@@ -5024,6 +5052,9 @@ copyObjectImpl(const void *from)
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
+		case T_ResultCache:
+			retval = _copyResultCache(from);
+			break;
 		case T_Sort:
 			retval = _copySort(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 785465d8c4..fa8f65fbc5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -846,6 +846,21 @@ _outMaterial(StringInfo str, const Material *node)
 	_outPlanInfo(str, (const Plan *) node);
 }
 
+static void
+_outResultCache(StringInfo str, const ResultCache *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHE");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_INT_FIELD(numKeys);
+	WRITE_OID_ARRAY(hashOperators, node->numKeys);
+	WRITE_OID_ARRAY(collations, node->numKeys);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outSortInfo(StringInfo str, const Sort *node)
 {
@@ -1920,6 +1935,21 @@ _outMaterialPath(StringInfo str, const MaterialPath *node)
 	WRITE_NODE_FIELD(subpath);
 }
 
+static void
+_outResultCachePath(StringInfo str, const ResultCachePath *node)
+{
+	WRITE_NODE_TYPE("RESULTCACHEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_NODE_FIELD(hash_operators);
+	WRITE_NODE_FIELD(param_exprs);
+	WRITE_BOOL_FIELD(singlerow);
+	WRITE_FLOAT_FIELD(calls, "%.0f");
+	WRITE_UINT_FIELD(est_entries);
+}
+
 static void
 _outUniquePath(StringInfo str, const UniquePath *node)
 {
@@ -2521,6 +2551,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_NODE_FIELD(right_em);
 	WRITE_BOOL_FIELD(outer_is_left);
 	WRITE_OID_FIELD(hashjoinoperator);
+	WRITE_OID_FIELD(hasheqoperator);
 }
 
 static void
@@ -3907,6 +3938,9 @@ outNode(StringInfo str, const void *obj)
 			case T_Material:
 				_outMaterial(str, obj);
 				break;
+			case T_ResultCache:
+				_outResultCache(str, obj);
+				break;
 			case T_Sort:
 				_outSort(str, obj);
 				break;
@@ -4141,6 +4175,9 @@ outNode(StringInfo str, const void *obj)
 			case T_MaterialPath:
 				_outMaterialPath(str, obj);
 				break;
+			case T_ResultCachePath:
+				_outResultCachePath(str, obj);
+				break;
 			case T_UniquePath:
 				_outUniquePath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a6e723a273..ecce23b747 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2211,6 +2211,26 @@ _readMaterial(void)
 	READ_DONE();
 }
 
+/*
+ * _readResultCache
+ */
+static ResultCache *
+_readResultCache(void)
+{
+	READ_LOCALS(ResultCache);
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_INT_FIELD(numKeys);
+	READ_OID_ARRAY(hashOperators, local_node->numKeys);
+	READ_OID_ARRAY(collations, local_node->numKeys);
+	READ_NODE_FIELD(param_exprs);
+	READ_BOOL_FIELD(singlerow);
+	READ_UINT_FIELD(est_entries);
+
+	READ_DONE();
+}
+
 /*
  * ReadCommonSort
  *	Assign the basic stuff of all nodes that inherit from Sort
@@ -2899,6 +2919,8 @@ parseNodeString(void)
 		return_value = _readHashJoin();
 	else if (MATCH("MATERIAL", 8))
 		return_value = _readMaterial();
+	else if (MATCH("RESULTCACHE", 11))
+		return_value = _readResultCache();
 	else if (MATCH("SORT", 4))
 		return_value = _readSort();
 	else if (MATCH("INCREMENTALSORT", 15))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f34399e3ec..3c9520d00a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4032,6 +4032,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Material";
 			subpath = ((MaterialPath *) path)->subpath;
 			break;
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 0c016a03dd..05686d0194 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -79,6 +79,7 @@
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeResultCache.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -139,6 +140,7 @@ bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
+bool		enable_resultcache = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
@@ -2402,6 +2404,147 @@ cost_material(Path *path,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_resultcache_rescan
+ *	  Determines the estimated cost of rescanning a ResultCache node.
+ *
+ * In order to estimate this, we must gain knowledge of how often we expect to
+ * be called and how many distinct sets of parameters we are likely to be
+ * called with. If we expect a good cache hit ratio, then we can set our
+ * costs to account for that hit ratio, plus a little bit of cost for the
+ * caching itself.  Caching will not work out well if we expect to be called
+ * with too many distinct parameter values.  The worst-case here is that we
+ * never see any parameter value twice, in which case we'd never get a cache
+ * hit and caching would be a complete waste of effort.
+ */
+static void
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+						Cost *rescan_startup_cost, Cost *rescan_total_cost)
+{
+	EstimationInfo estinfo;
+	Cost		input_startup_cost = rcpath->subpath->startup_cost;
+	Cost		input_total_cost = rcpath->subpath->total_cost;
+	double		tuples = rcpath->subpath->rows;
+	double		calls = rcpath->calls;
+	int			width = rcpath->subpath->pathtarget->width;
+
+	double		hash_mem_bytes;
+	double		est_entry_bytes;
+	double		est_cache_entries;
+	double		ndistinct;
+	double		evict_ratio;
+	double		hit_ratio;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* available cache space */
+	hash_mem_bytes = get_hash_mem() * 1024L;
+
+	/*
+	 * Set the number of bytes each cache entry should consume in the cache.
+	 * To provide us with better estimations on how many cache entries we can
+	 * store at once, we make a call to the executor here to ask it what
+	 * memory overheads there are for a single cache entry.
+	 *
+	 * XXX we also store the cache key, but that's not accounted for here.
+	 */
+	est_entry_bytes = relation_byte_size(tuples, width) +
+		ExecEstimateCacheEntryOverheadBytes(tuples);
+
+	/* estimate on the upper limit of cache entries we can hold at once */
+	est_cache_entries = floor(hash_mem_bytes / est_entry_bytes);
+
+	/* estimate on the distinct number of parameter values */
+	ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
+									&estinfo);
+
+	/*
+	 * When the estimation fell back on using a default value, it's a bit too
+	 * risky to assume that it's ok to use a Result Cache.  The use of a
+	 * default could cause us to use a Result Cache when it's really
+	 * inappropriate to do so.  If we see that this has been done, then we'll
+	 * assume that every call will have unique parameters, which will almost
+	 * certainly mean a ResultCachePath will never survive add_path().
+	 */
+	if ((estinfo.flags & SELFLAG_USED_DEFAULT) != 0)
+		ndistinct = calls;
+
+	/*
+	 * Since we've already estimated the maximum number of entries we can
+	 * store at once and know the estimated number of distinct values we'll be
+	 * called with, we'll take this opportunity to set the path's est_entries.
+	 * This will ultimately determine the hash table size that the executor
+	 * will use.  If we leave this at zero, the executor will just choose the
+	 * size itself.  Really this is not the right place to do this, but it's
+	 * convenient since everything is already calculated.
+	 */
+	rcpath->est_entries = Min(Min(ndistinct, est_cache_entries),
+							  PG_UINT32_MAX);
+
+	/*
+	 * When the number of distinct parameter values is above the amount we can
+	 * store in the cache, then we'll have to evict some entries from the
+	 * cache.  This is not free. Here we estimate how often we'll incur the
+	 * cost of that eviction.
+	 */
+	evict_ratio = 1.0 - Min(est_cache_entries, ndistinct) / ndistinct;
+
+	/*
+	 * In order to estimate how costly a single scan will be, we need to
+	 * attempt to estimate what the cache hit ratio will be.  To do that we
+	 * must look at how many scans are estimated in total for this node and
+	 * how many of those scans we expect to get a cache hit.
+	 */
+	hit_ratio = 1.0 / ndistinct * Min(est_cache_entries, ndistinct) -
+		(ndistinct / calls);
+
+	/* Ensure we don't go negative */
+	hit_ratio = Max(hit_ratio, 0.0);
+
+	/*
+	 * Set the total_cost accounting for the expected cache hit ratio.  We
+	 * also add on a cpu_operator_cost to account for a cache lookup. This
+	 * will happen regardless of whether it's a cache hit or not.
+	 */
+	total_cost = input_total_cost * (1.0 - hit_ratio) + cpu_operator_cost;
+
+	/* Now adjust the total cost to account for cache evictions */
+
+	/* Charge a cpu_tuple_cost for evicting the actual cache entry */
+	total_cost += cpu_tuple_cost * evict_ratio;
+
+	/*
+	 * Charge a 10th of cpu_operator_cost to evict every tuple in that entry.
+	 * The per-tuple eviction is really just a pfree, so charging a whole
+	 * cpu_operator_cost seems a little excessive.
+	 */
+	total_cost += cpu_operator_cost / 10.0 * evict_ratio * tuples;
+
+	/*
+	 * Now adjust for storing things in the cache, since that's not free
+	 * either.  Everything must go in the cache.  We don't proportion this
+	 * over any ratio, just apply it once for the scan.  We charge a
+	 * cpu_tuple_cost for the creation of the cache entry and also a
+	 * cpu_operator_cost for each tuple we expect to cache.
+	 */
+	total_cost += cpu_tuple_cost + cpu_operator_cost * tuples;
+
+	/*
+	 * Getting the first row must be also be proportioned according to the
+	 * expected cache hit ratio.
+	 */
+	startup_cost = input_startup_cost * (1.0 - hit_ratio);
+
+	/*
+	 * Additionally we charge a cpu_tuple_cost to account for cache lookups,
+	 * which we'll do regardless of whether it was a cache hit or not.
+	 */
+	startup_cost += cpu_tuple_cost;
+
+	*rescan_startup_cost = startup_cost;
+	*rescan_total_cost = total_cost;
+}
+
 /*
  * cost_agg
  *		Determines and returns the cost of performing an Agg plan node,
@@ -4142,6 +4285,11 @@ cost_rescan(PlannerInfo *root, Path *path,
 				*rescan_total_cost = run_cost;
 			}
 			break;
+		case T_ResultCache:
+			/* All the hard work is done by cost_resultcache_rescan */
+			cost_resultcache_rescan(root, (ResultCachePath *) path,
+									rescan_startup_cost, rescan_total_cost);
+			break;
 		default:
 			*rescan_startup_cost = path->startup_cost;
 			*rescan_total_cost = path->total_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 57ce97fd53..3894991a95 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -18,10 +18,13 @@
 
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "utils/typcache.h"
 
 /* Hook for plugins to get control in add_paths_to_joinrel() */
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
@@ -52,6 +55,9 @@ static void try_partial_mergejoin_path(PlannerInfo *root,
 static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
+static inline bool clause_sides_match_join(RestrictInfo *rinfo,
+										   RelOptInfo *outerrel,
+										   RelOptInfo *innerrel);
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
@@ -163,6 +169,11 @@ add_paths_to_joinrel(PlannerInfo *root,
 	{
 		case JOIN_SEMI:
 		case JOIN_ANTI:
+
+			/*
+			 * XXX it may be worth proving this to allow a ResultCache to be
+			 * considered for Nested Loop Semi/Anti Joins.
+			 */
 			extra.inner_unique = false; /* well, unproven */
 			break;
 		case JOIN_UNIQUE_INNER:
@@ -354,6 +365,180 @@ allow_star_schema_join(PlannerInfo *root,
 			bms_nonempty_difference(inner_paramrels, outerrelids));
 }
 
+/*
+ * paraminfo_get_equal_hashops
+ *		Determine if param_info and innerrel's lateral_vars can be hashed.
+ *		Returns true the hashing is possible, otherwise return false.
+ *
+ * Additionally we also collect the outer exprs and the hash operators for
+ * each parameter to innerrel.  These set in 'param_exprs' and 'operators'
+ * when we return true.
+ */
+static bool
+paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List **param_exprs, List **operators)
+
+{
+	ListCell   *lc;
+
+	*param_exprs = NIL;
+	*operators = NIL;
+
+	if (param_info != NULL)
+	{
+		List	   *clauses = param_info->ppi_clauses;
+
+		foreach(lc, clauses)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			OpExpr	   *opexpr;
+			Node	   *expr;
+
+			/* can't use result cache without a valid hash equals operator */
+			if (!OidIsValid(rinfo->hasheqoperator) ||
+				!clause_sides_match_join(rinfo, outerrel, innerrel))
+			{
+				list_free(*operators);
+				list_free(*param_exprs);
+				return false;
+			}
+
+			/*
+			 * We already checked that this is an OpExpr with 2 args when
+			 * setting hasheqoperator.
+			 */
+			opexpr = (OpExpr *) rinfo->clause;
+			if (rinfo->outer_is_left)
+				expr = (Node *) linitial(opexpr->args);
+			else
+				expr = (Node *) lsecond(opexpr->args);
+
+			*operators = lappend_oid(*operators, rinfo->hasheqoperator);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
+	}
+
+	/* Now add any lateral vars to the cache key too */
+	foreach(lc, innerrel->lateral_vars)
+	{
+		Node	   *expr = (Node *) lfirst(lc);
+		TypeCacheEntry *typentry;
+
+		/* Reject if there are any volatile functions */
+		if (contain_volatile_functions(expr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		typentry = lookup_type_cache(exprType(expr),
+									 TYPECACHE_HASH_PROC | TYPECACHE_EQ_OPR);
+
+		/* can't use result cache without a valid hash equals operator */
+		if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		{
+			list_free(*operators);
+			list_free(*param_exprs);
+			return false;
+		}
+
+		*operators = lappend_oid(*operators, typentry->eq_opr);
+		*param_exprs = lappend(*param_exprs, expr);
+	}
+
+	/* We're okay to use result cache */
+	return true;
+}
+
+/*
+ * get_resultcache_path
+ *		If possible, make and return a Result Cache path atop of 'inner_path'.
+ *		Otherwise return NULL.
+ */
+static Path *
+get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
+					 RelOptInfo *outerrel, Path *inner_path,
+					 Path *outer_path, JoinType jointype,
+					 JoinPathExtraData *extra)
+{
+	List	   *param_exprs;
+	List	   *hash_operators;
+	ListCell   *lc;
+
+	/* Obviously not if it's disabled */
+	if (!enable_resultcache)
+		return NULL;
+
+	/*
+	 * We can safely not bother with all this unless we expect to perform more
+	 * than one inner scan.  The first scan is always going to be a cache
+	 * miss.  This would likely fail later anyway based on costs, so this is
+	 * really just to save some wasted effort.
+	 */
+	if (outer_path->parent->rows < 2)
+		return NULL;
+
+	/*
+	 * We can only have a result cache when there's some kind of cache key,
+	 * either parameterized path clauses or lateral Vars.  No cache key sounds
+	 * more like something a Materialize node might be more useful for.
+	 */
+	if ((inner_path->param_info == NULL ||
+		 inner_path->param_info->ppi_clauses == NIL) &&
+		innerrel->lateral_vars == NIL)
+		return NULL;
+
+	/*
+	 * Currently we don't do this for SEMI and ANTI joins unless they're
+	 * marked as inner_unique.  This is because nested loop SEMI/ANTI joins
+	 * don't scan the inner node to completion, which will mean result cache
+	 * cannot mark the cache entry as complete.
+	 *
+	 * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
+	 * = true.  Should we?  See add_paths_to_joinrel()
+	 */
+	if (!extra->inner_unique && (jointype == JOIN_SEMI ||
+								 jointype == JOIN_ANTI))
+		return NULL;
+
+	/*
+	 * We can't use a result cache if there are volatile functions in the
+	 * inner rel's target list or restrict list.  A cache hit could reduce the
+	 * number of calls to these functions.
+	 */
+	if (contain_volatile_functions((Node *) innerrel->reltarget))
+		return NULL;
+
+	foreach(lc, innerrel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (contain_volatile_functions((Node *) rinfo))
+			return NULL;
+	}
+
+	/* Check if we have hash ops for each parameter to the path */
+	if (paraminfo_get_equal_hashops(root,
+									inner_path->param_info,
+									outerrel,
+									innerrel,
+									&param_exprs,
+									&hash_operators))
+	{
+		return (Path *) create_resultcache_path(root,
+												innerrel,
+												inner_path,
+												param_exprs,
+												hash_operators,
+												extra->inner_unique,
+												outer_path->parent->rows);
+	}
+
+	return NULL;
+}
+
 /*
  * try_nestloop_path
  *	  Consider a nestloop join path; if it appears useful, push it into
@@ -1471,6 +1656,7 @@ match_unsorted_outer(PlannerInfo *root,
 			foreach(lc2, innerrel->cheapest_parameterized_paths)
 			{
 				Path	   *innerpath = (Path *) lfirst(lc2);
+				Path	   *rcpath;
 
 				try_nestloop_path(root,
 								  joinrel,
@@ -1479,6 +1665,22 @@ match_unsorted_outer(PlannerInfo *root,
 								  merge_pathkeys,
 								  jointype,
 								  extra);
+
+				/*
+				 * Try generating a result cache path and see if that makes the
+				 * nested loop any cheaper.
+				 */
+				rcpath = get_resultcache_path(root, innerrel, outerrel,
+											  innerpath, outerpath, jointype,
+											  extra);
+				if (rcpath != NULL)
+					try_nestloop_path(root,
+									  joinrel,
+									  outerpath,
+									  rcpath,
+									  merge_pathkeys,
+									  jointype,
+									  extra);
 			}
 
 			/* Also consider materialized form of the cheapest inner path */
@@ -1633,6 +1835,7 @@ consider_parallel_nestloop(PlannerInfo *root,
 		foreach(lc2, innerrel->cheapest_parameterized_paths)
 		{
 			Path	   *innerpath = (Path *) lfirst(lc2);
+			Path	   *rcpath;
 
 			/* Can't join to an inner path that is not parallel-safe */
 			if (!innerpath->parallel_safe)
@@ -1657,6 +1860,17 @@ consider_parallel_nestloop(PlannerInfo *root,
 
 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
 									  pathkeys, jointype, extra);
+
+			/*
+			 * Try generating a result cache path and see if that makes the
+			 * nested loop any cheaper.
+			 */
+			rcpath = get_resultcache_path(root, innerrel, outerrel,
+										  innerpath, outerpath, jointype,
+										  extra);
+			if (rcpath != NULL)
+				try_partial_nestloop_path(root, joinrel, outerpath, rcpath,
+										  pathkeys, jointype, extra);
 		}
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index a56936e0e9..22f10fa339 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -91,6 +91,9 @@ static Result *create_group_result_plan(PlannerInfo *root,
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 									  int flags);
+static ResultCache *create_resultcache_plan(PlannerInfo *root,
+											ResultCachePath *best_path,
+											int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 								int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
@@ -277,6 +280,11 @@ static Sort *make_sort_from_groupcols(List *groupcls,
 									  AttrNumber *grpColIdx,
 									  Plan *lefttree);
 static Material *make_material(Plan *lefttree);
+static ResultCache *make_resultcache(Plan *lefttree, Oid *hashoperators,
+									 Oid *collations,
+									 List *param_exprs,
+									 bool singlerow,
+									 uint32 est_entries);
 static WindowAgg *make_windowagg(List *tlist, Index winref,
 								 int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
 								 int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -453,6 +461,11 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												 (MaterialPath *) best_path,
 												 flags);
 			break;
+		case T_ResultCache:
+			plan = (Plan *) create_resultcache_plan(root,
+													(ResultCachePath *) best_path,
+													flags);
+			break;
 		case T_Unique:
 			if (IsA(best_path, UpperUniquePath))
 			{
@@ -1566,6 +1579,56 @@ create_material_plan(PlannerInfo *root, MaterialPath *best_path, int flags)
 	return plan;
 }
 
+/*
+ * create_resultcache_plan
+ *	  Create a ResultCache plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static ResultCache *
+create_resultcache_plan(PlannerInfo *root, ResultCachePath *best_path, int flags)
+{
+	ResultCache *plan;
+	Plan	   *subplan;
+	Oid		   *operators;
+	Oid		   *collations;
+	List	   *param_exprs = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	int			nkeys;
+	int			i;
+
+	subplan = create_plan_recurse(root, best_path->subpath,
+								  flags | CP_SMALL_TLIST);
+
+	param_exprs = (List *) replace_nestloop_params(root, (Node *)
+												   best_path->param_exprs);
+
+	nkeys = list_length(param_exprs);
+	Assert(nkeys > 0);
+	operators = palloc(nkeys * sizeof(Oid));
+	collations = palloc(nkeys * sizeof(Oid));
+
+	i = 0;
+	forboth(lc, param_exprs, lc2, best_path->hash_operators)
+	{
+		Expr	   *param_expr = (Expr *) lfirst(lc);
+		Oid			opno = lfirst_oid(lc2);
+
+		operators[i] = opno;
+		collations[i] = exprCollation((Node *) param_expr);
+		i++;
+	}
+
+	plan = make_resultcache(subplan, operators, collations, param_exprs,
+							best_path->singlerow, best_path->est_entries);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
 /*
  * create_unique_plan
  *	  Create a Unique plan for 'best_path' and (recursively) plans
@@ -6452,6 +6515,28 @@ materialize_finished_plan(Plan *subplan)
 	return matplan;
 }
 
+static ResultCache *
+make_resultcache(Plan *lefttree, Oid *hashoperators, Oid *collations,
+				 List *param_exprs, bool singlerow, uint32 est_entries)
+{
+	ResultCache *node = makeNode(ResultCache);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	node->numKeys = list_length(param_exprs);
+	node->hashOperators = hashoperators;
+	node->collations = collations;
+	node->param_exprs = param_exprs;
+	node->singlerow = singlerow;
+	node->est_entries = est_entries;
+
+	return node;
+}
+
 Agg *
 make_agg(List *tlist, List *qual,
 		 AggStrategy aggstrategy, AggSplit aggsplit,
@@ -7038,6 +7123,7 @@ is_projection_capable_path(Path *path)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_IncrementalSort:
 		case T_Unique:
@@ -7083,6 +7169,7 @@ is_projection_capable_plan(Plan *plan)
 	{
 		case T_Hash:
 		case T_Material:
+		case T_ResultCache:
 		case T_Sort:
 		case T_Unique:
 		case T_SetOp:
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 20df2152ea..8232f45c58 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -33,6 +33,7 @@
 #include "parser/analyze.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 /* These parameters are set by GUC */
 int			from_collapse_limit;
@@ -77,6 +78,7 @@ static bool check_equivalence_delay(PlannerInfo *root,
 static bool check_redundant_nullability_qual(PlannerInfo *root, Node *clause);
 static void check_mergejoinable(RestrictInfo *restrictinfo);
 static void check_hashjoinable(RestrictInfo *restrictinfo);
+static void check_resultcacheable(RestrictInfo *restrictinfo);
 
 
 /*****************************************************************************
@@ -2208,6 +2210,13 @@ distribute_restrictinfo_to_rels(PlannerInfo *root,
 			 */
 			check_hashjoinable(restrictinfo);
 
+			/*
+			 * Likewise, check if the clause is suitable to be used with a
+			 * Result Cache node to cache inner tuples during a parameterized
+			 * nested loop.
+			 */
+			check_resultcacheable(restrictinfo);
+
 			/*
 			 * Add clause to the join lists of all the relevant relations.
 			 */
@@ -2450,6 +2459,7 @@ build_implied_join_equality(PlannerInfo *root,
 	/* Set mergejoinability/hashjoinability flags */
 	check_mergejoinable(restrictinfo);
 	check_hashjoinable(restrictinfo);
+	check_resultcacheable(restrictinfo);
 
 	return restrictinfo;
 }
@@ -2697,3 +2707,34 @@ check_hashjoinable(RestrictInfo *restrictinfo)
 		!contain_volatile_functions((Node *) restrictinfo))
 		restrictinfo->hashjoinoperator = opno;
 }
+
+/*
+ * check_resultcacheable
+ *	  If the restrictinfo's clause is suitable to be used for a Result Cache
+ *	  node, set the hasheqoperator to the hash equality operator that will be
+ *	  needed during caching.
+ */
+static void
+check_resultcacheable(RestrictInfo *restrictinfo)
+{
+	TypeCacheEntry *typentry;
+	Expr	   *clause = restrictinfo->clause;
+	Node	   *leftarg;
+
+	if (restrictinfo->pseudoconstant)
+		return;
+	if (!is_opclause(clause))
+		return;
+	if (list_length(((OpExpr *) clause)->args) != 2)
+		return;
+
+	leftarg = linitial(((OpExpr *) clause)->args);
+
+	typentry = lookup_type_cache(exprType(leftarg), TYPECACHE_HASH_PROC |
+													TYPECACHE_EQ_OPR);
+
+	if (!OidIsValid(typentry->hash_proc) || !OidIsValid(typentry->eq_opr))
+		return;
+
+	restrictinfo->hasheqoperator = typentry->eq_opr;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4a25431bec..6dd6f3001b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -752,6 +752,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_hash_references(root, plan, rtoffset);
 			break;
 
+		case T_ResultCache:
+			{
+				ResultCache *rcplan = (ResultCache *) plan;
+				rcplan->param_exprs = fix_scan_list(root, rcplan->param_exprs,
+													rtoffset,
+													NUM_EXEC_TLIST(plan));
+				break;
+			}
+
 		case T_Material:
 		case T_Sort:
 		case T_IncrementalSort:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 15b9453975..0881a208ac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2745,6 +2745,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			/* rescan_param does *not* get added to scan_params */
 			break;
 
+		case T_ResultCache:
+			finalize_primnode((Node *) ((ResultCache *) plan)->param_exprs,
+							  &context);
+			break;
+
 		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1c47a2fb49..b248b038e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1576,6 +1576,56 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	return pathnode;
 }
 
+/*
+ * create_resultcache_path
+ *	  Creates a path corresponding to a ResultCache plan, returning the
+ *	  pathnode.
+ */
+ResultCachePath *
+create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+						List *param_exprs, List *hash_operators,
+						bool singlerow, double calls)
+{
+	ResultCachePath *pathnode = makeNode(ResultCachePath);
+
+	Assert(subpath->parent == rel);
+
+	pathnode->path.pathtype = T_ResultCache;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = subpath->param_info;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe;
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+	pathnode->hash_operators = hash_operators;
+	pathnode->param_exprs = param_exprs;
+	pathnode->singlerow = singlerow;
+	pathnode->calls = calls;
+
+	/*
+	 * For now we set est_entries to 0.  cost_resultcache_rescan() does all
+	 * the hard work to determine how many cache entries there are likely to
+	 * be, so it seems best to leave it up to that function to fill this field
+	 * in.  If left at 0, the executor will make a guess at a good value.
+	 */
+	pathnode->est_entries = 0;
+
+	/*
+	 * Add a small additional charge for caching the first entry.  All the
+	 * harder calculations for rescans are performed in
+	 * cost_resultcache_rescan().
+	 */
+	pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
+	pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
+	pathnode->path.rows = subpath->rows;
+
+	return pathnode;
+}
+
 /*
  * create_unique_path
  *	  Creates a path representing elimination of distinct rows from the
@@ -3869,6 +3919,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
 									   apath->path.parallel_aware,
 									   -1);
 			}
+		case T_ResultCache:
+			{
+				ResultCachePath *rcpath = (ResultCachePath *) path;
+
+				return (Path *) create_resultcache_path(root, rel,
+														rcpath->subpath,
+														rcpath->param_exprs,
+														rcpath->hash_operators,
+														rcpath->singlerow,
+														rcpath->calls);
+			}
 		default:
 			break;
 	}
@@ -4087,6 +4148,16 @@ do { \
 			}
 			break;
 
+		case T_ResultCachePath:
+			{
+				ResultCachePath *rcpath;
+
+				FLAT_COPY_PATH(rcpath, path, ResultCachePath);
+				REPARAMETERIZE_CHILD_PATH(rcpath->subpath);
+				new_path = (Path *) rcpath;
+			}
+			break;
+
 		case T_GatherPath:
 			{
 				GatherPath *gpath;
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index 59ff35926e..aa9fb3a9fa 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -217,6 +217,8 @@ make_restrictinfo_internal(PlannerInfo *root,
 	restrictinfo->left_mcvfreq = -1;
 	restrictinfo->right_mcvfreq = -1;
 
+	restrictinfo->hasheqoperator = InvalidOid;
+
 	return restrictinfo;
 }
 
@@ -366,6 +368,7 @@ commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op)
 	result->right_bucketsize = rinfo->left_bucketsize;
 	result->left_mcvfreq = rinfo->right_mcvfreq;
 	result->right_mcvfreq = rinfo->left_mcvfreq;
+	result->hasheqoperator = InvalidOid;
 
 	return result;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03daec9a08..8a5d240385 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1036,6 +1036,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_resultcache", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of result caching."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_resultcache,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_nestloop", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of nested-loop join plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 791d39cf07..30cfddac1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -366,6 +366,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_incremental_sort = on
+#enable_resultcache = on
 #enable_tidscan = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 34dd861eff..26dcc4485e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -275,6 +275,13 @@ extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const Oid *eqfunctions,
 										 const Oid *collations,
 										 PlanState *parent);
+extern ExprState *ExecBuildParamSetEqual(TupleDesc desc,
+										 const TupleTableSlotOps *lops,
+										 const TupleTableSlotOps *rops,
+										 const Oid *eqfunctions,
+										 const Oid *collations,
+										 const List *param_exprs,
+										 PlanState *parent);
 extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 											   ExprContext *econtext,
 											   TupleTableSlot *slot,
diff --git a/src/include/executor/nodeResultCache.h b/src/include/executor/nodeResultCache.h
new file mode 100644
index 0000000000..df671d16f9
--- /dev/null
+++ b/src/include/executor/nodeResultCache.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeResultCache.h
+ *
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeResultCache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODERESULTCACHE_H
+#define NODERESULTCACHE_H
+
+#include "nodes/execnodes.h"
+
+extern ResultCacheState *ExecInitResultCache(ResultCache *node, EState *estate, int eflags);
+extern void ExecEndResultCache(ResultCacheState *node);
+extern void ExecReScanResultCache(ResultCacheState *node);
+extern double ExecEstimateCacheEntryOverheadBytes(double ntuples);
+extern void ExecResultCacheEstimate(ResultCacheState *node,
+									ParallelContext *pcxt);
+extern void ExecResultCacheInitializeDSM(ResultCacheState *node,
+										 ParallelContext *pcxt);
+extern void ExecResultCacheInitializeWorker(ResultCacheState *node,
+											ParallelWorkerContext *pwcxt);
+extern void ExecResultCacheRetrieveInstrumentation(ResultCacheState *node);
+
+#endif							/* NODERESULTCACHE_H */
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index aa196428ed..ddbdb207af 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3b39369a49..52d1fa018b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
+#include "lib/ilist.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
@@ -2037,6 +2038,71 @@ typedef struct MaterialState
 	Tuplestorestate *tuplestorestate;
 } MaterialState;
 
+struct ResultCacheEntry;
+struct ResultCacheTuple;
+struct ResultCacheKey;
+
+typedef struct ResultCacheInstrumentation
+{
+	uint64		cache_hits;		/* number of rescans where we've found the
+								 * scan parameter values to be cached */
+	uint64		cache_misses;	/* number of rescans where we've not found the
+								 * scan parameter values to be cached. */
+	uint64		cache_evictions;	/* number of cache entries removed due to
+									 * the need to free memory */
+	uint64		cache_overflows;	/* number of times we've had to bypass the
+									 * cache when filling it due to not being
+									 * able to free enough space to store the
+									 * current scan's tuples. */
+	uint64		mem_peak;		/* peak memory usage in bytes */
+} ResultCacheInstrumentation;
+
+/* ----------------
+ *	 Shared memory container for per-worker resultcache information
+ * ----------------
+ */
+typedef struct SharedResultCacheInfo
+{
+	int			num_workers;
+	ResultCacheInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedResultCacheInfo;
+
+/* ----------------
+ *	 ResultCacheState information
+ *
+ *		resultcache nodes are used to cache recent and commonly seen results
+ *		from a parameterized scan.
+ * ----------------
+ */
+typedef struct ResultCacheState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	int			rc_status;		/* value of ExecResultCache state machine */
+	int			nkeys;			/* number of cache keys */
+	struct resultcache_hash *hashtable; /* hash table for cache entries */
+	TupleDesc	hashkeydesc;	/* tuple descriptor for cache keys */
+	TupleTableSlot *tableslot;	/* min tuple slot for existing cache entries */
+	TupleTableSlot *probeslot;	/* virtual slot used for hash lookups */
+	ExprState  *cache_eq_expr;	/* Compare exec params to hash key */
+	ExprState **param_exprs;	/* exprs containing the parameters to this
+								 * node */
+	FmgrInfo   *hashfunctions;	/* lookup data for hash funcs nkeys in size */
+	Oid		   *collations;		/* collation for comparisons nkeys in size */
+	uint64		mem_used;		/* bytes of memory used by cache */
+	uint64		mem_limit;		/* memory limit in bytes for the cache */
+	MemoryContext tableContext; /* memory context to store cache data */
+	dlist_head	lru_list;		/* least recently used entry list */
+	struct ResultCacheTuple *last_tuple;	/* Used to point to the last tuple
+											 * returned during a cache hit and
+											 * the tuple we last stored when
+											 * populating the cache. */
+	struct ResultCacheEntry *entry; /* the entry that 'last_tuple' belongs to
+									 * or NULL if 'last_tuple' is NULL. */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first tuple. */
+	ResultCacheInstrumentation stats;	/* execution statistics */
+	SharedResultCacheInfo *shared_info; /* statistics for parallel workers */
+} ResultCacheState;
 
 /* ----------------
  *	 When performing sorting by multiple keys, it's possible that the input
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 704f00fd30..2051abbbf9 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -74,6 +74,7 @@ typedef enum NodeTag
 	T_MergeJoin,
 	T_HashJoin,
 	T_Material,
+	T_ResultCache,
 	T_Sort,
 	T_IncrementalSort,
 	T_Group,
@@ -132,6 +133,7 @@ typedef enum NodeTag
 	T_MergeJoinState,
 	T_HashJoinState,
 	T_MaterialState,
+	T_ResultCacheState,
 	T_SortState,
 	T_IncrementalSortState,
 	T_GroupState,
@@ -242,6 +244,7 @@ typedef enum NodeTag
 	T_MergeAppendPath,
 	T_GroupResultPath,
 	T_MaterialPath,
+	T_ResultCachePath,
 	T_UniquePath,
 	T_GatherPath,
 	T_GatherMergePath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e4e1c15986..a65bda7e3c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1494,6 +1494,25 @@ typedef struct MaterialPath
 	Path	   *subpath;
 } MaterialPath;
 
+/*
+ * ResultCachePath represents a ResultCache plan node, i.e., a cache that
+ * caches tuples from parameterized paths to save the underlying node from
+ * having to be rescanned for parameter values which are already cached.
+ */
+typedef struct ResultCachePath
+{
+	Path		path;
+	Path	   *subpath;		/* outerpath to cache tuples from */
+	List	   *hash_operators; /* hash operators for each key */
+	List	   *param_exprs;	/* cache keys */
+	bool		singlerow;		/* true if the cache entry is to be marked as
+								 * complete after caching the first record. */
+	double		calls;			/* expected number of rescans */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCachePath;
+
 /*
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
@@ -2091,6 +2110,9 @@ typedef struct RestrictInfo
 	Selectivity right_bucketsize;	/* avg bucketsize of right side */
 	Selectivity left_mcvfreq;	/* left side's most common val's freq */
 	Selectivity right_mcvfreq;	/* right side's most common val's freq */
+
+	/* hash equality operator used for result cache, else InvalidOid */
+	Oid			hasheqoperator;
 } RestrictInfo;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 623dc450ee..1678bd66fe 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -779,6 +779,27 @@ typedef struct Material
 	Plan		plan;
 } Material;
 
+/* ----------------
+ *		result cache node
+ * ----------------
+ */
+typedef struct ResultCache
+{
+	Plan		plan;
+
+	int			numKeys;		/* size of the two arrays below */
+
+	Oid		   *hashOperators;	/* hash operators for each key */
+	Oid		   *collations;		/* cache keys */
+	List	   *param_exprs;	/* exprs containing parameters */
+	bool		singlerow;		/* true if the cache entry should be marked as
+								 * complete after we store the first tuple in
+								 * it. */
+	uint32		est_entries;	/* The maximum number of entries that the
+								 * planner expects will fit in the cache, or 0
+								 * if unknown */
+} ResultCache;
+
 /* ----------------
  *		sort node
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index a3fd93fe07..0fe60d82e4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
+extern PGDLLIMPORT bool enable_resultcache;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
 extern PGDLLIMPORT bool enable_gathermerge;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d539bc2783..53261ee91f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,13 @@ extern GroupResultPath *create_group_result_path(PlannerInfo *root,
 												 PathTarget *target,
 												 List *havingqual);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
+extern ResultCachePath *create_resultcache_path(PlannerInfo *root,
+												RelOptInfo *rel,
+												Path *subpath,
+												List *param_exprs,
+												List *hash_operators,
+												bool singlerow,
+												double calls);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 									  Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1ae0e5d939..ca06d41dd0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2584,6 +2584,7 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
@@ -2599,6 +2600,7 @@ explain (costs off)
                ->  Seq Scan on onek
 (8 rows)
 
+reset enable_resultcache;
 --
 -- Hash Aggregation Spill tests
 --
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04e802d421..86fd3907c5 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2536,6 +2536,7 @@ reset enable_nestloop;
 --
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2559,6 +2560,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
 --
@@ -3663,8 +3665,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3674,17 +3676,19 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3694,9 +3698,11 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Index Scan using tenk1_unique2 on tenk1 t3
-               Index Cond: (unique2 = t2.thousand)
-(11 rows)
+         ->  Result Cache
+               Cache Key: t2.thousand
+               ->  Index Scan using tenk1_unique2 on tenk1 t3
+                     Index Cond: (unique2 = t2.thousand)
+(13 rows)
 
 explain (costs off)
 select count(*) from
@@ -4210,8 +4216,8 @@ where t1.f1 = ss.f1;
                     QUERY PLAN                    
 --------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
-   Join Filter: (t1.f1 = t2.f1)
+   Output: t1.f1, i8.q1, i8.q2, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop Left Join
          Output: t1.f1, i8.q1, i8.q2
          ->  Seq Scan on public.text_tbl t1
@@ -4221,11 +4227,14 @@ where t1.f1 = ss.f1;
                ->  Seq Scan on public.int8_tbl i8
                      Output: i8.q1, i8.q2
                      Filter: (i8.q2 = 123)
-   ->  Limit
-         Output: (i8.q1), t2.f1
-         ->  Seq Scan on public.text_tbl t2
-               Output: i8.q1, t2.f1
-(16 rows)
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: i8.q1
+         ->  Limit
+               Output: (i8.q1), t2.f1
+               ->  Seq Scan on public.text_tbl t2
+                     Output: i8.q1, t2.f1
+(19 rows)
 
 select * from
   text_tbl t1
@@ -4246,13 +4255,13 @@ select * from
   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss1,
   lateral (select ss1.* from text_tbl t3 limit 1) as ss2
 where t1.f1 = ss2.f1;
-                            QUERY PLAN                             
--------------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop
-   Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1, ((i8.q1)), (t2.f1)
-   Join Filter: (t1.f1 = (t2.f1))
+   Output: t1.f1, i8.q1, i8.q2, q1, f1, q1, f1
+   Join Filter: (t1.f1 = f1)
    ->  Nested Loop
-         Output: t1.f1, i8.q1, i8.q2, (i8.q1), t2.f1
+         Output: t1.f1, i8.q1, i8.q2, q1, f1
          ->  Nested Loop Left Join
                Output: t1.f1, i8.q1, i8.q2
                ->  Seq Scan on public.text_tbl t1
@@ -4262,15 +4271,21 @@ where t1.f1 = ss2.f1;
                      ->  Seq Scan on public.int8_tbl i8
                            Output: i8.q1, i8.q2
                            Filter: (i8.q2 = 123)
+         ->  Result Cache
+               Output: q1, f1
+               Cache Key: i8.q1
+               ->  Limit
+                     Output: (i8.q1), t2.f1
+                     ->  Seq Scan on public.text_tbl t2
+                           Output: i8.q1, t2.f1
+   ->  Result Cache
+         Output: q1, f1
+         Cache Key: q1, f1
          ->  Limit
-               Output: (i8.q1), t2.f1
-               ->  Seq Scan on public.text_tbl t2
-                     Output: i8.q1, t2.f1
-   ->  Limit
-         Output: ((i8.q1)), (t2.f1)
-         ->  Seq Scan on public.text_tbl t3
-               Output: (i8.q1), t2.f1
-(22 rows)
+               Output: (q1), (f1)
+               ->  Seq Scan on public.text_tbl t3
+                     Output: q1, f1
+(28 rows)
 
 select * from
   text_tbl t1
@@ -4316,14 +4331,17 @@ where tt1.f1 = ss1.c0;
                      ->  Seq Scan on public.text_tbl tt4
                            Output: tt4.f1
                            Filter: (tt4.f1 = 'foo'::text)
-   ->  Subquery Scan on ss1
+   ->  Result Cache
          Output: ss1.c0
-         Filter: (ss1.c0 = 'foo'::text)
-         ->  Limit
-               Output: (tt4.f1)
-               ->  Seq Scan on public.text_tbl tt5
-                     Output: tt4.f1
-(29 rows)
+         Cache Key: tt4.f1
+         ->  Subquery Scan on ss1
+               Output: ss1.c0
+               Filter: (ss1.c0 = 'foo'::text)
+               ->  Limit
+                     Output: (tt4.f1)
+                     ->  Seq Scan on public.text_tbl tt5
+                           Output: tt4.f1
+(32 rows)
 
 select 1 from
   text_tbl as tt1
@@ -4997,34 +5015,40 @@ select count(*) from tenk1 a, lateral generate_series(1,two) g;
 
 explain (costs off)
   select count(*) from tenk1 a, lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
   select count(*) from tenk1 a, generate_series(1,two) g;
-                   QUERY PLAN                   
-------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
-         ->  Function Scan on generate_series g
-(4 rows)
+         ->  Result Cache
+               Cache Key: a.two
+               ->  Function Scan on generate_series g
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -5079,14 +5103,15 @@ explain (costs off)
                             QUERY PLAN                            
 ------------------------------------------------------------------
  Aggregate
-   ->  Hash Join
-         Hash Cond: ("*VALUES*".column1 = b.unique2)
+   ->  Nested Loop
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
                ->  Values Scan on "*VALUES*"
-         ->  Hash
+         ->  Result Cache
+               Cache Key: "*VALUES*".column1
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Index Cond: (unique2 = "*VALUES*".column1)
+(9 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c4e827caec..10f3ce3d07 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,6 +1958,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
@@ -2086,8 +2089,8 @@ create index ab_a3_b3_a_idx on ab_a3_b3 (a);
 set enable_hashjoin = 0;
 set enable_mergejoin = 0;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2096,32 +2099,36 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2130,31 +2137,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = (a.a + 0))
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: (a.a + 0)
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = (a.a + 0))
+(31 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2163,30 +2174,34 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-(27 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+(31 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                        explain_parallel_append                                         
---------------------------------------------------------------------------------------------------------
+                                           explain_parallel_append                                            
+--------------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2196,31 +2211,35 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
-                                     explain_parallel_append                                     
--------------------------------------------------------------------------------------------------
+                                        explain_parallel_append                                         
+--------------------------------------------------------------------------------------------------------
  Finalize Aggregate (actual rows=N loops=N)
    ->  Gather (actual rows=N loops=N)
          Workers Planned: 1
@@ -2230,26 +2249,30 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
                            Rows Removed by Filter: N
-                     ->  Append (actual rows=N loops=N)
-                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
-                                 Index Cond: (a = a.a)
-                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
-                                 Index Cond: (a = a.a)
-(28 rows)
+                     ->  Result Cache (actual rows=N loops=N)
+                           Cache Key: a.a
+                           Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           Worker 0:  Hits: N  Misses: N  Evictions: 0  Overflows: 0  Memory Usage: NkB
+                           ->  Append (actual rows=N loops=N)
+                                 ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                       Index Cond: (a = a.a)
+                                 ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                       Index Cond: (a = a.a)
+(32 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
new file mode 100644
index 0000000000..65d9e25169
--- /dev/null
+++ b/src/test/regress/expected/resultcache.out
@@ -0,0 +1,158 @@
+-- Perform tests on the Result Cache node.
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+	ln := regexp_replace(ln, 'Heap Fetches: \d+', 'Heap Fetches: N');
+        return next ln;
+    end loop;
+end;
+$$;
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SET enable_bitmapscan TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=1000 loops=1)
+               Filter: (unique1 < 1000)
+               Rows Removed by Filter: 9000
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t2.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t2.twenty)
+                     Heap Fetches: N
+(11 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+                                    explain_resultcache                                     
+--------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1000 loops=1)
+         ->  Seq Scan on tenk1 t1 (actual rows=1000 loops=1)
+               Filter: (unique1 < 1000)
+               Rows Removed by Filter: 9000
+         ->  Result Cache (actual rows=1 loops=1000)
+               Cache Key: t1.twenty
+               Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
+                     Index Cond: (unique1 = t1.twenty)
+                     Heap Fetches: N
+(11 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 1200;', true);
+                                     explain_resultcache                                      
+----------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1200 loops=1)
+         ->  Seq Scan on tenk1 t2 (actual rows=1200 loops=1)
+               Filter: (unique1 < 1200)
+               Rows Removed by Filter: 8800
+         ->  Result Cache (actual rows=1 loops=1200)
+               Cache Key: t2.thousand
+               Hits: N  Misses: N  Evictions: N  Overflows: 0  Memory Usage: NkB
+               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 (actual rows=1 loops=1028)
+                     Index Cond: (unique1 = t2.thousand)
+                     Heap Fetches: N
+(11 rows)
+
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_bitmapscan;
+RESET enable_hashjoin;
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SET max_parallel_workers_per_gather TO 2;
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Nested Loop
+                     ->  Parallel Bitmap Heap Scan on tenk1 t1
+                           Recheck Cond: (unique1 < 1000)
+                           ->  Bitmap Index Scan on tenk1_unique1
+                                 Index Cond: (unique1 < 1000)
+                     ->  Result Cache
+                           Cache Key: t1.twenty
+                           ->  Index Only Scan using tenk1_unique1 on tenk1 t2
+                                 Index Cond: (unique1 = t1.twenty)
+(13 rows)
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+ count |        avg         
+-------+--------------------
+  1000 | 9.5000000000000000
+(1 row)
+
+RESET max_parallel_workers_per_gather;
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index d5532d0ccc..c7986fb7fc 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1091,19 +1091,21 @@ select sum(o.four), sum(ss.a) from
     select * from x
   ) ss
 where o.ten = 1;
-                    QUERY PLAN                     
----------------------------------------------------
+                       QUERY PLAN                        
+---------------------------------------------------------
  Aggregate
    ->  Nested Loop
          ->  Seq Scan on onek o
                Filter: (ten = 1)
-         ->  CTE Scan on x
-               CTE x
-                 ->  Recursive Union
-                       ->  Result
-                       ->  WorkTable Scan on x x_1
-                             Filter: (a < 10)
-(10 rows)
+         ->  Result Cache
+               Cache Key: o.four
+               ->  CTE Scan on x
+                     CTE x
+                       ->  Recursive Union
+                             ->  Result
+                             ->  WorkTable Scan on x x_1
+                                   Filter: (a < 10)
+(12 rows)
 
 select sum(o.four), sum(ss.a) from
   onek o cross join lateral (
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 98dde452e6..0bb558d93c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,10 +111,11 @@ select name, setting from pg_settings where name like 'enable%';
  enable_partition_pruning       | on
  enable_partitionwise_aggregate | off
  enable_partitionwise_join      | off
+ enable_resultcache             | on
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(19 rows)
+(20 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 312c11a4bd..2e89839089 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression resultcache
 
 # event triggers cannot run concurrently with any test that runs DDL
 # oidjoins is read-only, though, and should run late for best coverage
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5a80bfacd8..a46f3d0178 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -203,6 +203,7 @@ test: partition_info
 test: tuplesort
 test: explain
 test: compression
+test: resultcache
 test: event_trigger
 test: oidjoins
 test: fast_default
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index eb53668299..eb80a2fe06 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1098,9 +1098,11 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
+set enable_resultcache to off;
 explain (costs off)
   select 1 from tenk1
    where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_resultcache;
 
 --
 -- Hash Aggregation Spill tests
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8164383fb5..7f866c603b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -550,6 +550,7 @@ reset enable_nestloop;
 
 set work_mem to '64kB';
 set enable_mergejoin to off;
+set enable_resultcache to off;
 
 explain (costs off)
 select count(*) from tenk1 a, tenk1 b
@@ -559,6 +560,7 @@ select count(*) from tenk1 a, tenk1 b
 
 reset work_mem;
 reset enable_mergejoin;
+reset enable_resultcache;
 
 --
 -- regression test for 8.2 bug with improper re-ordering of left joins
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6ccb52ad1d..bd40779d31 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,6 +464,9 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
new file mode 100644
index 0000000000..2be5b8f2d8
--- /dev/null
+++ b/src/test/regress/sql/resultcache.sql
@@ -0,0 +1,91 @@
+-- Perform tests on the Result Cache node.
+
+-- The cache hits/misses/evictions from the Result Cache node can vary between
+-- machines.  Let's just replace the number with an 'N'.  In order to allow us
+-- to perform validation when the measure was zero, we replace a zero value
+-- with "Zero".  All other numbers are replaced with 'N'.
+create function explain_resultcache(query text, hide_hitmiss bool) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off, timing off) %s',
+            query)
+    loop
+        if hide_hitmiss = true then
+                ln := regexp_replace(ln, 'Hits: 0', 'Hits: Zero');
+                ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
+                ln := regexp_replace(ln, 'Misses: 0', 'Misses: Zero');
+                ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
+        end if;
+        ln := regexp_replace(ln, 'Evictions: 0', 'Evictions: Zero');
+        ln := regexp_replace(ln, 'Evictions: \d+', 'Evictions: N');
+        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+	ln := regexp_replace(ln, 'Heap Fetches: \d+', 'Heap Fetches: N');
+        return next ln;
+    end loop;
+end;
+$$;
+
+-- Ensure we get a result cache on the inner side of the nested loop
+SET enable_hashjoin TO off;
+SET enable_bitmapscan TO off;
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
+WHERE t2.unique1 < 1000;
+
+-- Try with LATERAL joins
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- Reduce work_mem so that we see some cache evictions
+SET work_mem TO '64kB';
+SET enable_mergejoin TO off;
+-- Ensure we get some evictions.  We're unable to validate the hits and misses
+-- here as the number of entries that fit in the cache at once will vary
+-- between different machines.
+SELECT explain_resultcache('
+SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
+INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
+WHERE t2.unique1 < 1200;', true);
+RESET enable_mergejoin;
+RESET work_mem;
+RESET enable_bitmapscan;
+RESET enable_hashjoin;
+
+-- Test parallel plans with Result Cache.
+SET min_parallel_table_scan_size TO 0;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SET max_parallel_workers_per_gather TO 2;
+
+-- Ensure we get a parallel plan.
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+-- And ensure the parallel plan gives us the correct results.
+SELECT COUNT(*),AVG(t2.unique1) FROM tenk1 t1,
+LATERAL (SELECT t2.unique1 FROM tenk1 t2 WHERE t1.twenty = t2.unique1) t2
+WHERE t1.unique1 < 1000;
+
+RESET max_parallel_workers_per_gather;
+RESET parallel_tuple_cost;
+RESET parallel_setup_cost;
+RESET min_parallel_table_scan_size;
-- 
2.27.0

#104

houzj.fnst@fujitsu.com

almost 5 years ago

In reply to: David Rowley (#103)

RE: Hybrid Hash/Nested Loop joins and caching results from subplans

I've attached the updated patch. I'll let the CFbot grab this to ensure it's
happy with it before I go looking to push it again.

Hi,

I took a look into the patch and noticed some minor things.

1.
+		case T_ResultCache:
+			ptype = "ResultCache";
+			subpath = ((ResultCachePath *) path)->subpath;
+			break;
 		case T_UniquePath:
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
should we use "case T_ResultCachePath" here?

2.
Is it better to add ResultCache's info to " src/backend/optimizer/README " ?
Something like:
NestPath - nested-loop joins
MergePath - merge joins
HashPath - hash joins
+ ResultCachePath - Result cache

Best regards,
Hou zhijie

#105

dgrowleyml@gmail.com

almost 5 years ago

In reply to: houzj.fnst@fujitsu.com (#104)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Thu, 1 Apr 2021 at 23:41, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

I've attached the updated patch. I'll let the CFbot grab this to ensure it's
happy with it before I go looking to push it again.

Hi,

I took a look into the patch and noticed some minor things.
1.
+               case T_ResultCache:
+                       ptype = "ResultCache";
+                       subpath = ((ResultCachePath *) path)->subpath;
+                       break;
case T_UniquePath:
ptype = "Unique";
subpath = ((UniquePath *) path)->subpath;
should we use "case T_ResultCachePath" here?
2.
Is it better to add ResultCache's info to " src/backend/optimizer/README " ?
Something like:
NestPath - nested-loop joins
MergePath - merge joins
HashPath - hash joins
+ ResultCachePath - Result cache

Thanks for pointing those two things out.

I've pushed the patch again with some updates to EXPLAIN to fix the
issue from yesterday. I also disabled result cache in the
partition_prune tests as I suspect that the parallel tests there might
just be a bit too unstable in the buildfarm. The cache
hit/miss/eviction line might disappear if the main process does not
get a chance to do any work.

Well, it's now time to watch the buildfarm again...

David

#106

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: David Rowley (#91)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

On Fri, Mar 12, 2021 at 8:31 AM David Rowley <dgrowleyml@gmail.com> wrote:

Thanks for these suggestions.

On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

I have a couple ideas;

I just checked the latest code, looks like we didn't improve this
situation except
that we introduced a GUC to control it. Am I missing something? I don't
have a
suggestion though.

- default enable_resultcache=off seems okay. In plenty of cases,

planning

time is unimportant. This is the "low bar" - if we can do better and

enable

it by default, that's great.

I think that's reasonable. Teaching the planner to do new tricks is
never going to make the planner produce plans more quickly. When the
new planner trick gives us a more optimal plan, then great. When it
does not then it's wasted effort. Giving users the ability to switch
off the planner's new ability seems like a good way for people who
continually find it the additional effort costs more than it saves
seems like a good way to keep them happy.

- Maybe this should be integrated into nestloop rather than being a

separate

plan node. That means that it could be dynamically enabled during
execution, maybe after a few loops or after checking that there's at

least

some minimal number of repeated keys and cache hits. cost_nestloop

would

consider whether to use a result cache or not, and explain would show

the

cache stats as a part of nested loop. In this case, I propose

there'd still

be a GUC to disable it.

There was quite a bit of discussion on that topic already on this
thread. I don't really want to revisit that.

The main problem with that is that we'd be forced into costing a
Nested loop with a result cache exactly the same as we do for a plain
nested loop. If we were to lower the cost to account for the cache
hits then the planner is more likely to choose a nested loop over a
merge/hash join. If we then switched the caching off during execution
due to low cache hits then that does not magically fix the bad choice
of join method. The planner may have gone with a Hash Join if it had
known the cache hit ratio would be that bad. We'd still be left to
deal with the poor performing nested loop. What you'd really want
instead of turning the cache off would be to have nested loop ditch
the parameter scan and just morph itself into a Hash Join node. (I'm
not proposing we do that)

- Maybe cost_resultcache() can be split into initial_cost and final_cost
parts, same as for nestloop ? I'm not sure how it'd work, since
initial_cost is supposed to return a lower bound, and resultcache

tries to

make things cheaper. initial_cost would just add some operator/tuple

costs

to make sure that resultcache of a unique scan is more expensive than
nestloop alone. estimate_num_groups is at least O(n) WRT
rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs)

*

cpu_operator_cost in initial_cost and then call estimate_num_groups in
final_cost. We'd be estimating the cost of estimating the cost...

The cost of the Result Cache is pretty dependant on the n_distinct
estimate. Low numbers of distinct values tend to estimate a high
number of cache hits, whereas large n_distinct values (relative to the
number of outer rows) is not going to estimate a large number of cache
hits.

I don't think feeding in a fake value would help us here. We'd
probably do better if we had a fast way to determine if a given Expr
is unique. (e.g UniqueKeys patch). Result Cache is never going to be
a win for a parameter that the value is never the same as some
previously seen value. This would likely allow us to skip considering
a Result Cache for the majority of OLTP type joins.

- Maybe an initial implementation of this would only add a result cache

if the

best plan was already going to use a nested loop, even though a cached
nested loop might be cheaper than other plans. This would avoid most
planner costs, and give improved performance at execution time, but

leaves
something "on the table" for the future.
+cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
+                     Cost *rescan_startup_cost, Cost
*rescan_total_cost)
+{
+     double          tuples = rcpath->subpath->rows;
+     double          calls = rcpath->calls;
...
+     /* estimate on the distinct number of parameter values */
+     ndistinct = estimate_num_groups(root, rcpath->param_exprs,
calls, NULL,

+ &estinfo);

Shouldn't this pass "tuples" and not "calls" ?

hmm. I don't think so. "calls" is the estimated number of outer side
rows. Here you're asking if the n_distinct estimate is relevant to
the inner side rows. It's not. If we expect to be called 1000 times by
the outer side of the nested loop, then we need to know our n_distinct
estimate for those 1000 rows. If the estimate comes back as 10
distinct values and we see that we're likely to be able to fit all the
tuples for those 10 distinct values in the cache, then the hit ratio
is going to come out at 99%. 10 misses, for the first time each value
is looked up and the remainder of the 990 calls will be hits. The
number of tuples (and the width of tuples) on the inside of the nested
loop is only relevant to calculating how many cache entries is likely
to fit into hash_mem. When we think cache entries will be evicted
then that makes the cache hit calculation more complex.

I've tried to explain what's going on in cost_resultcache_rescan() the
best I can with comments. I understand it's still pretty hard to
follow what's going on. I'm open to making it easier to understand if
you have suggestions.

David

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#107