why partition pruning doesn't work?

Started by Pavel Stehuleover 7 years ago72 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

CREATE TABLE data(a text, vlozeno date) PARTITION BY RANGE(vlozeno);
CREATE TABLE data_2016 PARTITION OF data FOR VALUES FROM
('2016-01-01') TO ('2016-12-31');
CREATE TABLE data_2017 PARTITION OF data FOR VALUES FROM
('2017-01-01') TO ('2017-12-31');
CREATE TABLE data_other PARTITION OF DATA DEFAULT;

insert into data select 'ahoj', '2016-01-01'::date + (random() *
900)::int from generate_series(1,1000000);
analyze data;

postgres=# explain analyze select * from data where vlozeno > '2018-06-01';
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Append (cost=0.00..3519.83 rows=20001 width=9) (actual
time=0.042..27.750 rows=19428 loops=1) │
│ -> Seq Scan on data_other (cost=0.00..3419.83 rows=20001
width=9) (actual time=0.040..25.895 rows=19428 loops=1) │
│ Filter: (vlozeno > '2018-06-01'::date)

│ Rows Removed by Filter: 171518

│ Planning Time: 0.766 ms

│ Execution Time: 28.718 ms

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(6 rows)

postgres=# explain analyze select * from data where vlozeno > current_date;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Gather (cost=1000.00..17281.36 rows=20080 width=9) (actual
time=0.749..95.389 rows=19428 loops=1)

│ Workers Planned: 2

│ Workers Launched: 2

│ -> Parallel Append (cost=0.00..14273.36 rows=8367 width=9)
(actual time=59.141..89.458 rows=6476 loops=3) │
│ -> Parallel Seq Scan on data_2016 (cost=0.00..5768.69
rows=24 width=9) (actual time=34.847..34.847 rows=0 loops=3) │
│ Filter: (vlozeno > CURRENT_DATE)

│ Rows Removed by Filter: 135119

│ -> Parallel Seq Scan on data_2017 (cost=0.00..5745.02
rows=23 width=9) (actual time=53.269..53.269 rows=0 loops=2) │
│ Filter: (vlozeno > CURRENT_DATE)

│ Rows Removed by Filter: 201848

│ -> Parallel Seq Scan on data_other (cost=0.00..2717.82
rows=11765 width=9) (actual time=0.044..55.502 rows=19428 loops=1) │
│ Filter: (vlozeno > CURRENT_DATE)

│ Rows Removed by Filter: 171518

│ Planning Time: 0.677 ms

│ Execution Time: 98.349 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(15 rows)

but

postgres=# explain analyze select * from data where vlozeno > (select
current_date);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Append (cost=0.01..19574.68 rows=333333 width=9) (actual
time=0.095..31.945 rows=19428 loops=1) │
│ InitPlan 1 (returns
$0)

│ -> Result (cost=0.00..0.01 rows=1 width=4) (actual
time=0.010..0.010 rows=1 loops=1) │
│ -> Seq Scan on data_2016 (cost=0.00..7258.98 rows=135119 width=9)
(never executed) │
│ Filter: (vlozeno >
$0)

│ -> Seq Scan on data_2017 (cost=0.00..7229.20 rows=134565 width=9)
(never executed) │
│ Filter: (vlozeno >
$0)

│ -> Seq Scan on data_other (cost=0.00..3419.83 rows=63649 width=9)
(actual time=0.069..29.856 rows=19428 loops=1) │
│ Filter: (vlozeno >
$0)

│ Rows Removed by Filter:
171518

│ Planning Time: 0.418
ms

│ Execution Time: 33.019
ms

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(12 rows)

Partition pruning is working now.

Is it expected? Tested on fresh master.

The commit message
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5
says so append should be supported.

Regards

Pavel

#2Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Pavel Stehule (#1)
1 attachment(s)
Re: why partition pruning doesn't work?

On 1 June 2018 at 07:19, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Partition pruning is working now.

Is it expected? Tested on fresh master.

That's interesting. So there are two cases:

* vlozeno > (select current_date) (pruning works)

* vlozeno > current_date (pruning doesn't work)

In pull_partkey_params when we need to extract Params matching partition key in
the first case everything is fine, we've got an expr of type Param. In the
second case we've got a SQLValueFunction, which is ignored in the code - so
eventually we think that there is nothing matching a partition key and we don't
need to apply pruning.

With the attached hacky patch it would be taken into account (although I assume
in reality SQLValueFunction should be treated somehow differently) and pruning
is happening:

=# explain analyze select * from data where vlozeno > current_date;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..17223.38 rows=19512 width=9) (actual
time=0.456..32.952 rows=19340 loop s=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..14272.18 rows=8130 width=9)
(actual time=0.042..26.616 rows =6447 loops=3)
-> Parallel Seq Scan on data_2016 (cost=0.00..5771.19
rows=24 width=9) (never executed)
Filter: (vlozeno > CURRENT_DATE)
-> Parallel Seq Scan on data_2017 (cost=0.00..5747.65
rows=23 width=9) (never executed)
Filter: (vlozeno > CURRENT_DATE)
-> Parallel Seq Scan on data_other (cost=0.00..2712.69
rows=11431 width=9) (actual time=0.032..26.031 rows=6447 loops=3)
Filter: (vlozeno > CURRENT_DATE)
Rows Removed by Filter: 57084
Planning Time: 1.256 ms
Execution Time: 35.327 ms
(13 rows)

Time: 40.291 ms

Attachments:

partpruning_sql_value_function.patchtext/x-patch; charset=US-ASCII; name=partpruning_sql_value_function.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a6..61c6c0c 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2705,6 +2705,15 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		{
 			Expr	   *expr = lfirst(lc2);
 
+			if (IsA(expr, SQLValueFunction))
+			{
+				Param	   *param = (Param *) expr;
+
+				pinfo->execparams = bms_add_member(pinfo->execparams,
+												   param->paramid);
+				gotone = true;
+			}
+
 			if (IsA(expr, Param))
 			{
 				Param	   *param = (Param *) expr;
@@ -3038,6 +3047,7 @@ partkey_datum_from_expr(PartitionPruneContext *context,
 			return true;
 
 		case T_Param:
+		case T_SQLValueFunction:
 
 			/*
 			 * When being called from the executor we may be able to evaluate
#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Dmitry Dolgov (#2)
Re: why partition pruning doesn't work?

On Fri, Jun 1, 2018 at 9:47 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 1 June 2018 at 07:19, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Partition pruning is working now.

Is it expected? Tested on fresh master.

That's interesting. So there are two cases:

* vlozeno > (select current_date) (pruning works)

* vlozeno > current_date (pruning doesn't work)

In pull_partkey_params when we need to extract Params matching partition key in
the first case everything is fine, we've got an expr of type Param. In the
second case we've got a SQLValueFunction, which is ignored in the code - so
eventually we think that there is nothing matching a partition key and we don't
need to apply pruning.

With the attached hacky patch it would be taken into account (although I assume
in reality SQLValueFunction should be treated somehow differently) and pruning
is happening:

I think the patch is right if we were to handle only SQLValueFunction,
but the bigger picture here is that we aren't evaluating stable
functions before run-time partition pruning happens. I was under the
impression that the stable functions/expressions get evaluated and
folded into a constant just before the execution begins since a stable
function produces the same output for same input during one execution
invocation. But I am not able to find where we do that and probably we
don't do that at all. If we could do that then it's matter of using
same methods as plan-time partition pruning to prune the partitions.

If we go ahead with this patch, we should at least update it to handle
stable functions for the sake of completeness.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#3)
Re: why partition pruning doesn't work?

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

I think the patch is right if we were to handle only SQLValueFunction,
but the bigger picture here is that we aren't evaluating stable
functions before run-time partition pruning happens. I was under the
impression that the stable functions/expressions get evaluated and
folded into a constant just before the execution begins since a stable
function produces the same output for same input during one execution
invocation. But I am not able to find where we do that and probably we
don't do that at all.

We don't; there was a patch floating around to make that happen, but
it hasn't been updated lately.

I agree though that it seems strange to special-case SQLValueFunction
rather than any-stable-expression. As long as the evaluation happens
at executor start (i.e. with the query's run-time snapshot) it should
be reasonable to simplify any stable expression.

It's worth questioning whether this is a bug fix or an improvement.
If the latter, it probably ought to wait for v12.

regards, tom lane

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#4)
Re: why partition pruning doesn't work?

2018-06-01 17:53 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

I think the patch is right if we were to handle only SQLValueFunction,
but the bigger picture here is that we aren't evaluating stable
functions before run-time partition pruning happens. I was under the
impression that the stable functions/expressions get evaluated and
folded into a constant just before the execution begins since a stable
function produces the same output for same input during one execution
invocation. But I am not able to find where we do that and probably we
don't do that at all.

We don't; there was a patch floating around to make that happen, but
it hasn't been updated lately.

I agree though that it seems strange to special-case SQLValueFunction
rather than any-stable-expression. As long as the evaluation happens
at executor start (i.e. with the query's run-time snapshot) it should
be reasonable to simplify any stable expression.

It's worth questioning whether this is a bug fix or an improvement.
If the latter, it probably ought to wait for v12.

The result is correct - but it was unpleasant surprise. I searched the most
simple demo for this feature, and it doesn't work. Filtering based on
CURRENT_DATE is often.

Regards

Pavel

Show quoted text

regards, tom lane

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#4)
Re: why partition pruning doesn't work?

On Fri, Jun 1, 2018 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I agree though that it seems strange to special-case SQLValueFunction
rather than any-stable-expression. As long as the evaluation happens
at executor start (i.e. with the query's run-time snapshot) it should
be reasonable to simplify any stable expression.

It's worth questioning whether this is a bug fix or an improvement.
If the latter, it probably ought to wait for v12.

If explaining the change requires reference to tokens from the source code,
rather than something an end user could understand, I'd argue it is a bug
fix rather than an improvement.

Cheers,

Jeff

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Jeff Janes (#6)
Re: why partition pruning doesn't work?

On Sat, Jun 2, 2018 at 5:16 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Jun 1, 2018 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I agree though that it seems strange to special-case SQLValueFunction
rather than any-stable-expression. As long as the evaluation happens
at executor start (i.e. with the query's run-time snapshot) it should
be reasonable to simplify any stable expression.

It's worth questioning whether this is a bug fix or an improvement.
If the latter, it probably ought to wait for v12.

If explaining the change requires reference to tokens from the source code,
rather than something an end user could understand, I'd argue it is a bug
fix rather than an improvement.

If we going to implement stable expression folding before the actual
execution starts, that's a feature in itself. So, it's V12 material.
Partition pruning will use that feature. I don't think we should make
partition pruning work with stable expressions in some ad-hoc way in
V11 and the some future release (mostly V12) implements it on top of
stable expression folding feature. So my vote for making it work in
V12.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#6)
Re: why partition pruning doesn't work?

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Jun 1, 2018 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's worth questioning whether this is a bug fix or an improvement.
If the latter, it probably ought to wait for v12.

If explaining the change requires reference to tokens from the source code,
rather than something an end user could understand, I'd argue it is a bug
fix rather than an improvement.

Well, the difference between volatile, stable and immutable functions is
well-documented, so I don't think that's a great argument. If there's
some aspect of this behavior that's not predictable from understanding
which class the partition key expression falls into, then I could agree
that's a bug.

regards, tom lane

#9Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#8)
1 attachment(s)
Re: why partition pruning doesn't work?

On 1 June 2018 at 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

I think the patch is right if we were to handle only SQLValueFunction,
but the bigger picture here is that we aren't evaluating stable
functions before run-time partition pruning happens.

I agree though that it seems strange to special-case SQLValueFunction
rather than any-stable-expression. As long as the evaluation happens
at executor start (i.e. with the query's run-time snapshot) it should
be reasonable to simplify any stable expression.

Just to clarify for myself, for evaluating any stable function here would it be
enough to handle all function-like expressions (FuncExpr / OpExpr /
DistinctExpr / NullIfExpr) and check a corresponding function for provolatile,
like in the attached patch?

Attachments:

partpruning_stable_func.patchapplication/octet-stream; name=partpruning_stable_func.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..b9e801e1d0 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -43,6 +43,7 @@
 #include "access/nbtree.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
+#include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -2704,28 +2705,80 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		foreach(lc2, stepop->exprs)
 		{
 			Expr	   *expr = lfirst(lc2);
+			char		provolatile;
+			Param	   *param;
 
-			if (IsA(expr, Param))
+			switch (nodeTag(expr))
 			{
-				Param	   *param = (Param *) expr;
+				case T_OpExpr:
+				case T_DistinctExpr:
+				case T_NullIfExpr:
+				{
+					OpExpr	   *op = (OpExpr *) expr;
+					provolatile =  func_volatile(op->opfuncid);
+
+					if (provolatile == PROVOLATILE_STABLE ||
+						provolatile == PROVOLATILE_IMMUTABLE)
+					{
+						param = (Param *) expr;
+						pinfo->execparams = bms_add_member(pinfo->execparams,
+														   param->paramid);
 
-				switch (param->paramkind)
+					}
+					gotone = true;
+					break;
+				}
+				case T_FuncExpr:
 				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
+					FuncExpr	   *func = (FuncExpr *) expr;
+					provolatile =  func_volatile(func->funcid);
+
+					if (provolatile == PROVOLATILE_STABLE ||
+						provolatile == PROVOLATILE_IMMUTABLE)
+					{
+						param = (Param *) expr;
 						pinfo->execparams = bms_add_member(pinfo->execparams,
 														   param->paramid);
-						break;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
+					}
+					gotone = true;
+					break;
 				}
-				gotone = true;
+				case T_SQLValueFunction:
+				{
+					param = (Param *) expr;
+
+					pinfo->execparams = bms_add_member(pinfo->execparams,
+													   param->paramid);
+					gotone = true;
+					break;
+				}
+				case T_Param:
+				{
+					param = (Param *) expr;
+
+					switch (param->paramkind)
+					{
+						case PARAM_EXTERN:
+							pinfo->extparams = bms_add_member(pinfo->extparams,
+															  param->paramid);
+							break;
+						case PARAM_EXEC:
+							pinfo->execparams = bms_add_member(pinfo->execparams,
+															   param->paramid);
+							break;
+
+						default:
+							elog(ERROR, "unrecognized paramkind: %d",
+								 (int) param->paramkind);
+							break;
+					}
+					gotone = true;
+					break;
+				}
+				default:
+					gotone = false;
+					break;
 			}
 		}
 	}
@@ -3038,6 +3091,8 @@ partkey_datum_from_expr(PartitionPruneContext *context,
 			return true;
 
 		case T_Param:
+		case T_SQLValueFunction:
+		case T_FuncExpr:
 
 			/*
 			 * When being called from the executor we may be able to evaluate
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#9)
Re: why partition pruning doesn't work?

Dmitry Dolgov <9erthalion6@gmail.com> writes:

Just to clarify for myself, for evaluating any stable function here would it be
enough to handle all function-like expressions (FuncExpr / OpExpr /
DistinctExpr / NullIfExpr) and check a corresponding function for provolatile,
like in the attached patch?

I think the entire approach is wrong here. Rather than concerning
yourself with Params, or any other specific expression type, you
should be using !contain_volatile_functions() to decide whether
an expression is run-time-constant. If it is, use the regular
expression evaluation machinery to extract the value.

regards, tom lane

#11Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#10)
1 attachment(s)
Re: why partition pruning doesn't work?

On 3 June 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dmitry Dolgov <9erthalion6@gmail.com> writes:

Just to clarify for myself, for evaluating any stable function here would it be
enough to handle all function-like expressions (FuncExpr / OpExpr /
DistinctExpr / NullIfExpr) and check a corresponding function for provolatile,
like in the attached patch?

I think the entire approach is wrong here. Rather than concerning
yourself with Params, or any other specific expression type, you
should be using !contain_volatile_functions() to decide whether
an expression is run-time-constant. If it is, use the regular
expression evaluation machinery to extract the value.

Yes, it makes sense. Then, to my understanding, the attached code is close to
what was described above (although I'm not sure about the Const part).

Attachments:

partpruning_stable_func_v2.patchtext/x-patch; charset=US-ASCII; name=partpruning_stable_func_v2.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a6..f02bd30 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2705,6 +2705,9 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		{
 			Expr	   *expr = lfirst(lc2);
 
+			if (contain_volatile_functions((Node *) expr))
+				continue;
+
 			if (IsA(expr, Param))
 			{
 				Param	   *param = (Param *) expr;
@@ -2727,6 +2730,13 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 				}
 				gotone = true;
 			}
+			else if (!IsA(expr, Const))
+			{
+				Param	*param = (Param *) expr;
+				pinfo->execparams = bms_add_member(pinfo->execparams,
+												   param->paramid);
+				gotone = true;
+			}
 		}
 	}
 
@@ -3031,37 +3041,35 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
-	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
-			return true;
-
-		case T_Param:
-
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+	if (contain_volatile_functions((Node *) expr))
+		return false;
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+	if (IsA(expr, Const))
+	{
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
+	{
+		/*
+		 * When being called from the executor we may be able to evaluate
+		 * the Param's value.
+		 */
+		if (context->planstate &&
+			bms_is_member(((Param *) expr)->paramid, context->safeparams))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
 
-				return true;
-			}
-			break;
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
 
-		default:
-			break;
+			return true;
+		}
 	}
 
 	return false;
#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Dmitry Dolgov (#11)
Re: why partition pruning doesn't work?

Hi Dmitry,

Thanks for creating the patch. I looked at it and have some comments.

On 2018/06/04 22:30, Dmitry Dolgov wrote:

On 3 June 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dmitry Dolgov <9erthalion6@gmail.com> writes:

Just to clarify for myself, for evaluating any stable function here would it be
enough to handle all function-like expressions (FuncExpr / OpExpr /
DistinctExpr / NullIfExpr) and check a corresponding function for provolatile,
like in the attached patch?

I think the entire approach is wrong here. Rather than concerning
yourself with Params, or any other specific expression type, you
should be using !contain_volatile_functions() to decide whether
an expression is run-time-constant. If it is, use the regular
expression evaluation machinery to extract the value.

Yes, it makes sense. Then, to my understanding, the attached code is close to
what was described above (although I'm not sure about the Const part).

This:

@@ -2727,6 +2730,13 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List
*steps)
                 }
                 gotone = true;
             }
+            else if (!IsA(expr, Const))
+            {
+                Param   *param = (Param *) expr;
+                pinfo->execparams = bms_add_member(pinfo->execparams,
+                                                   param->paramid);
+                gotone = true;
+            }

doesn't look quite right. What says expr is really a Param? The patch
appears to work because, by setting pinfo->execparams to *something*, it
triggers execution-time pruning to run; its contents aren't necessarily
used during execution pruning. In fact, it would've crashed if the
execution-time pruning code had required execparams to contain *valid*
param id, but currently it doesn't.

What I think we'd need to do to make this work is to make execution-time
pruning be invoked even if there aren't any Params involved. IOW, let's
try to teach make_partition_pruneinfo that it can go ahead also in the
cases where there are expressions being compared with the partition key
that contain (only) stable functions. Then, go and fix the
execution-pruning code to not *always* expect there to be Params to prune
with.

Maybe, David (added to cc) has something to say about all this, especially
whether he considers this a feature and not a bug fix.

Thanks,
Amit

#13Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Langote (#12)
Re: why partition pruning doesn't work?

On 5 June 2018 at 12:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

doesn't look quite right. What says expr is really a Param? The patch
appears to work because, by setting pinfo->execparams to *something*, it
triggers execution-time pruning to run; its contents aren't necessarily
used during execution pruning. In fact, it would've crashed if the
execution-time pruning code had required execparams to contain *valid*
param id, but currently it doesn't.

What I think we'd need to do to make this work is to make execution-time
pruning be invoked even if there aren't any Params involved. IOW, let's
try to teach make_partition_pruneinfo that it can go ahead also in the
cases where there are expressions being compared with the partition key
that contain (only) stable functions. Then, go and fix the
execution-pruning code to not *always* expect there to be Params to prune
with.

Yeah, I agree - I copied this approach mindlessly from the original hacky
patch. So, looks like it's necessary to have something like got_stable_expr
together with gotparam. And after that the only place where I see Params
are in use is partkey_datum_from_expr where all the stuff is actually
evaluated. So apparently this part about "fix the execution-pruning code to not
*always* expect there to be Params to prune with" will be only about this
function - am I correct or there is something else that I missed?

#14Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Dmitry Dolgov (#13)
Re: why partition pruning doesn't work?

On Tue, Jun 5, 2018 at 6:24 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 5 June 2018 at 12:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

doesn't look quite right. What says expr is really a Param? The patch
appears to work because, by setting pinfo->execparams to *something*, it
triggers execution-time pruning to run; its contents aren't necessarily
used during execution pruning. In fact, it would've crashed if the
execution-time pruning code had required execparams to contain *valid*
param id, but currently it doesn't.

What I think we'd need to do to make this work is to make execution-time
pruning be invoked even if there aren't any Params involved. IOW, let's
try to teach make_partition_pruneinfo that it can go ahead also in the
cases where there are expressions being compared with the partition key
that contain (only) stable functions. Then, go and fix the
execution-pruning code to not *always* expect there to be Params to prune
with.

Yeah, I agree - I copied this approach mindlessly from the original hacky
patch. So, looks like it's necessary to have something like got_stable_expr
together with gotparam.

I think the current code is heavily relying on Params to be present
for partition pruning, which isn't true. Runtime partition pruning is
possible when there are comparison conditions with partition key
expressions on one side and "execution time constant" expressions on
the other side. By "execution time constant" expression, I mean any
expression that evaluates to a constant at the time of execution like
a stable expressions (not just functions) or a Param expression. I can
think of only these two at this time, but there can be more. So,
gotparam should be renamed as "gotprunable_cond" to be generic.
pull_partkey_params() should be renamed as "pull_partkey_conds" or
something generic. That function would return true if there exists an
expression in steps which can be evaluated to a constant at runtime,
otherwise it returns false. My guess is there will be false-positives
which need to be dealt with later, but there will be no
false-negatives.

And after that the only place where I see Params
are in use is partkey_datum_from_expr where all the stuff is actually
evaluated. So apparently this part about "fix the execution-pruning code to not
*always* expect there to be Params to prune with" will be only about this
function - am I correct or there is something else that I missed?

Yes. But I think trying to evaluate parameters in this function is not
good. The approach of folding constant expressions before or
immediately after the execution starts doesn't require the expressions
to be evaluated in partkey_datum_from_expr and might benefit other
places where stable expressions or params can appear.

Other problem with partkey_datum_from_expr() seems to be that it
evaluated only param nodes but not the expressions involving
parameters which can folded into constants at runtime. Take for
example following queries on table t1 with two partitions (0, 100) and
(100, 200), populated using "insert into t1 select i, i from
generate_series(0, 199) i;". There's an index on t1(a).

explain analyze select * from t1 x left join t1 y on x.a = y.b where y.a = 5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..6.78 rows=1 width=16) (actual
time=0.033..0.066 rows=1 loops=1)
-> Append (cost=0.00..2.25 rows=1 width=8) (actual
time=0.019..0.035 rows=1 loops=1)
-> Seq Scan on t1p1 y (cost=0.00..2.25 rows=1 width=8)
(actual time=0.018..0.035 rows=1 loops=1)
Filter: (a = 5)
Rows Removed by Filter: 99
-> Append (cost=0.00..4.51 rows=2 width=8) (actual
time=0.011..0.027 rows=1 loops=1)
-> Seq Scan on t1p1 x (cost=0.00..2.25 rows=1 width=8)
(actual time=0.006..0.022 rows=1 loops=1)
Filter: (y.b = a)
Rows Removed by Filter: 99
-> Seq Scan on t1p2 x_1 (cost=0.00..2.25 rows=1 width=8)
(never executed)
Filter: (y.b = a)
Planning Time: 0.644 ms
Execution Time: 0.115 ms
(13 rows)

t1p2 x_1 is never scanned indicating that run time partition pruning
happened. But then see the following query

postgres:17889=#explain analyze select * from t1 x left join t1 y on
x.a = y.b + 100 where y.a = 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..7.28 rows=1 width=16) (actual
time=0.055..0.093 rows=1 loops=1)
-> Append (cost=0.00..2.25 rows=1 width=8) (actual
time=0.017..0.034 rows=1 loops=1)
-> Seq Scan on t1p1 y (cost=0.00..2.25 rows=1 width=8)
(actual time=0.016..0.033 rows=1 loops=1)
Filter: (a = 5)
Rows Removed by Filter: 99
-> Append (cost=0.00..5.01 rows=2 width=8) (actual
time=0.034..0.054 rows=1 loops=1)
-> Seq Scan on t1p1 x (cost=0.00..2.50 rows=1 width=8)
(actual time=0.026..0.026 rows=0 loops=1)
Filter: ((y.b + 100) = a)
Rows Removed by Filter: 100
-> Seq Scan on t1p2 x_1 (cost=0.00..2.50 rows=1 width=8)
(actual time=0.007..0.027 rows=1 loops=1)
Filter: ((y.b + 100) = a)
Rows Removed by Filter: 99
Planning Time: 0.424 ms
Execution Time: 0.139 ms
(14 rows)

The scan on t1p1 x returns no rows and should have been pruned since
y.b + 100 is constant for a given y.b.

But for this to work, folding constant expressions doesn't help since
y.b changes with every rescan of t1 x. So may be we need some way to
constant fold expression during ExecutorRewind() as well.

This is digression from the original report, but it's still within the
scope of "why partition pruning doesn't work?"

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#15David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#12)
1 attachment(s)
Re: why partition pruning doesn't work?

On 5 June 2018 at 22:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Maybe, David (added to cc) has something to say about all this, especially
whether he considers this a feature and not a bug fix.

Thanks, Amit. I had missed this thread.

Yeah. I admit if I'd thought about this case when I wrote the code,
then I'd have made any non-volatile Expr work, but I didn't :-(

It was pointed out to be a few months ago in a comment in [1]https://blog.2ndquadrant.com/partition-elimination-postgresql-11/. I
initially thought that this was v12 material, but it seems there are a
few people here that are pretty unhappy about it.

I was going to describe what such a patch should look like here, but
that seemed like about as much work as writing it, so:

Please see the attached patch. I've only just finished with it and
it's not fully done yet as there's still an XXX comment where I've not
quite thought about Exprs with Vars from higher levels. These might
always be converted to Params, so the code might be okay as is, but
I've not checked this yet, hence the comment remains.

I'm slightly leaning towards this being included in v11. Without this
people are forced into hacks like WHERE partkey = (SELECT
stablefunc()); to get pruning working at all. If that SQL remains
after this patch then pruning can only take place during actual
execution. With the attached patch the pruning can take place during
the initialization of the executor, which in cases with many
partitions can be significantly faster, providing actual execution is
short. I'd rather people didn't get into bad habits like that if we
can avoid it.

[1]: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_pruning_for_exprs.patchapplication/octet-stream; name=run-time_pruning_for_exprs.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c83991c93c..f9a812480d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1349,7 +1349,11 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * difference between these that we need to concern ourselves with is the
  * time when the values of the Params are known.  External Param values are
  * known at any time of execution, including executor startup, but exec Param
- * values are only known when the executor is running.
+ * values are only known when the executor is running.  We also support
+ * pruning using any stable expression which does not contain any Vars.
+ * Immutable expressions would have been evaluated to a Const during planning,
+ * so plan-time pruning would have taken care of any pruning.  It's not
+ * possible for pruning to take place using volatile expressions.
  *
  * For external Params we may be able to prune away unneeded partitions
  * during executor startup.  This has the added benefit of not having to
@@ -1418,6 +1422,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 	prunestate->num_partprunedata = list_length(partitionpruneinfo);
 	prunestate->extparams = NULL;
 	prunestate->execparams = NULL;
+	prunestate->hasparamlessexprs = false;
 
 	/*
 	 * Create a sub memory context which we'll use when making calls to the
@@ -1513,6 +1518,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		pprune->pruning_steps = pinfo->pruning_steps;
 		pprune->extparams = bms_copy(pinfo->extparams);
 		pprune->allparams = bms_union(pinfo->extparams, pinfo->execparams);
+		pprune->hasparamlessexprs = pinfo->hasparamlessexprs;
 
 		/*
 		 * Accumulate the paramids which match the partitioned keys of all
@@ -1524,6 +1530,8 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		prunestate->execparams = bms_add_members(prunestate->execparams,
 												 pinfo->execparams);
 
+		prunestate->hasparamlessexprs |= pinfo->hasparamlessexprs;
+
 		relation_close(rel, NoLock);
 
 		i++;
@@ -1566,10 +1574,11 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	Bitmapset  *result = NULL;
 
 	/*
-	 * Ensure there's actually external params, or we've not been called
-	 * already.
+	 * If there are no parameter-less Exprs then ensure there's actually
+	 * external params, or we've not been called already.
 	 */
-	Assert(!bms_is_empty(prunestate->extparams));
+	Assert(prunestate->hasparamlessexprs ||
+		   !bms_is_empty(prunestate->extparams));
 
 	pprune = prunestate->partprunedata;
 
@@ -1739,14 +1748,14 @@ find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 
 	/*
 	 * We only need to determine the matching partitions if there are any
-	 * params matching the partition key at this level.  If there are no
-	 * matching params, then we can simply return all subnodes which belong to
-	 * this parent partition.  The planner should have already determined
-	 * these to be the minimum possible set.  We must still recursively visit
-	 * any subpartitioned tables as we may find their partition keys match
-	 * some Params at their level.
+	 * params matching the partition key at this level, or if there are any
+	 * parameter-less expressions matching the partition key.  However, if
+	 * it's just parameter-less expressions then we only prune during
+	 * ExecFindInitialMatchingSubPlans, there's no point in doing this from
+	 * ExecFindMatchingSubPlans too, once is enough.
 	 */
-	if (!bms_is_empty(pruneparams))
+	if ((!allparams && pprune->hasparamlessexprs) ||
+		!bms_is_empty(pruneparams))
 	{
 		context->safeparams = pruneparams;
 		partset = get_matching_partitions(context,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6bc3e470bf..275254aa65 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -139,10 +139,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 												  node->part_prune_infos);
 
 		/*
-		 * When there are external params matching the partition key we may be
-		 * able to prune away Append subplans now.
+		 * When there are parameter-less exprs or any external params matching
+		 * the partition key we may be able to prune away Append subplans now.
 		 */
-		if (!bms_is_empty(prunestate->extparams))
+		if (prunestate->hasparamlessexprs ||
+			!bms_is_empty(prunestate->extparams))
 		{
 			/* Determine which subplans match the external params */
 			validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7afe..78b737b2b9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2174,6 +2174,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 	COPY_SCALAR_FIELD(reloid);
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
+	COPY_SCALAR_FIELD(hasparamlessexprs);
 	COPY_SCALAR_FIELD(nparts);
 	COPY_POINTER_FIELD(subnode_map, from->nparts * sizeof(int));
 	COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 610f9edaf5..846055d68d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1741,6 +1741,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
 	WRITE_OID_FIELD(reloid);
 	WRITE_NODE_FIELD(pruning_steps);
 	WRITE_BITMAPSET_FIELD(present_parts);
+	WRITE_BOOL_FIELD(hasparamlessexprs);
 	WRITE_INT_FIELD(nparts);
 
 	appendStringInfoString(str, " :subnode_map");
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2826cec2f8..67f30f431e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1362,6 +1362,7 @@ _readPartitionPruneInfo(void)
 	READ_OID_FIELD(reloid);
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
+	READ_BOOL_FIELD(hasparamlessexprs);
 	READ_INT_FIELD(nparts);
 	READ_INT_ARRAY(subnode_map, local_node->nparts);
 	READ_INT_ARRAY(subpart_map, local_node->nparts);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..cb2969403d 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -53,6 +53,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
+#include "optimizer/var.h"
 #include "partitioning/partprune.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
@@ -115,6 +116,24 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
+/*
+ * expression_tree_walker context struct for gathering paramids of params
+ * matching the partition key.
+ */
+typedef struct PullParamContext
+{
+	Bitmapset *extparams;
+	Bitmapset *execparams;
+} PullParamContext;
+
+/*
+ * expression_tree_walker context struct for checking if an Expr contains any
+ * Params not listed in 'safeparams'.
+ */
+typedef struct SafeParamContext
+{
+	Bitmapset *safeparams;
+} SafeParamContext;
 
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
@@ -162,6 +181,8 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
 static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
 						  StrategyNumber opstrategy, Datum *values, int nvalues,
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static void pull_params(Expr *expr, PartitionPruneInfo *pinfo);
+static bool pull_params_walker(Node *node, PullParamContext *context);
 static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
@@ -172,6 +193,9 @@ static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
 							   Expr *partkey, Expr **outconst);
 static bool partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value);
+static bool contains_only_safeparams_walker(Node *node,
+								SafeParamContext *context);
+static bool contains_only_safeparams(Expr *expr, Bitmapset *safeparams);
 
 /*
  * make_partition_pruneinfo
@@ -197,7 +221,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	int		   *relid_subnode_map;
 	int		   *relid_subpart_map;
 	int			i;
-	bool		gotparam = false;
+	bool		gotnonconst = false;
 
 	/*
 	 * Allocate two arrays to store the 1-based indexes of the 'subpaths' and
@@ -326,11 +350,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		pinfo->subpart_map = subpart_map;
 
 		/*
-		 * Extract Params matching partition key and record if we got any.
-		 * We'll not bother enabling run-time pruning if no params matched the
-		 * partition key at any level of partitioning.
+		 * Extract Params matching partition key and record if any steps
+		 * compare a non-Const value to the partition key.  If everything
+		 * is Const then we've no need to perform run-time pruning as the
+		 * planner will have already selected the minimum set of partitions.
 		 */
-		gotparam |= pull_partkey_params(pinfo, pruning_steps);
+		gotnonconst |= pull_partkey_params(pinfo, pruning_steps);
 
 		pinfolist = lappend(pinfolist, pinfo);
 	}
@@ -338,14 +363,11 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	pfree(relid_subnode_map);
 	pfree(relid_subpart_map);
 
-	if (gotparam)
+	/* Enable pruning if we got any non-Consts */
+	if (gotnonconst)
 		return pinfolist;
 
-	/*
-	 * If no Params were found to match the partition key on any of the
-	 * partitioned relations then there's no point doing any run-time
-	 * partition pruning.
-	 */
+	/* Run-time pruning would be useless */
 	return NIL;
 }
 
@@ -1478,6 +1500,11 @@ match_clause_to_partition_key(RelOptInfo *rel,
 		if (contain_volatile_functions((Node *) expr))
 			return PARTCLAUSE_UNSUPPORTED;
 
+		/* We can't prune using an expression with Vars */
+		/* XXX this only checks for vars at level 0, We need to disable any Var */
+		if (contain_var_clause((Node *) expr))
+			return PARTCLAUSE_UNSUPPORTED;
+
 		/*
 		 * Determine the input types of the operator we're considering.
 		 *
@@ -2682,16 +2709,74 @@ get_matching_range_bounds(PartitionPruneContext *context,
 	return result;
 }
 
+/*
+ * pull_params
+ *		Determine all external and exec params in 'expr' and add the found
+ *		paramids to the appropriate 'pinfo' field.  pinfo's hasparamlessexprs
+ *		field is set if any non-Const expression is found which have no
+ *		parameters.
+ */
+static void
+pull_params(Expr *expr, PartitionPruneInfo *pinfo)
+{
+	PullParamContext context;
+
+	context.extparams = NULL;
+	context.execparams = NULL;
+
+	pull_params_walker((Node *) expr, &context);
+
+	pinfo->extparams = bms_add_members(pinfo->extparams, context.extparams);
+	pinfo->execparams = bms_add_members(pinfo->execparams,
+										context.execparams);
+
+	/* Mark that an Expr has been seen which contains no Params */
+	if (!context.extparams && !context.execparams)
+		pinfo->hasparamlessexprs = true;
+}
+
+static bool
+pull_params_walker(Node *node, PullParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		switch (param->paramkind)
+		{
+			case PARAM_EXTERN:
+				context->extparams = bms_add_member(context->extparams,
+													param->paramid);
+				break;
+			case PARAM_EXEC:
+				context->execparams = bms_add_member(context->execparams,
+													 param->paramid);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized paramkind: %d",
+					 (int) param->paramkind);
+				break;
+		}
+	}
+	return expression_tree_walker(node, pull_params_walker,
+								  (void *) context);
+}
+
 /*
  * pull_partkey_params
  *		Loop through each pruning step and record each external and exec
  *		Params being compared to the partition keys.
+ *
+ * Returns true if any non-const value is being compared to the partition key.
  */
 static bool
 pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 {
 	ListCell   *lc;
-	bool		gotone = false;
+	bool		gotnonconst = false;
 
 	foreach(lc, steps)
 	{
@@ -2705,32 +2790,15 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		{
 			Expr	   *expr = lfirst(lc2);
 
-			if (IsA(expr, Param))
-			{
-				Param	   *param = (Param *) expr;
-
-				switch (param->paramkind)
-				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
-						pinfo->execparams = bms_add_member(pinfo->execparams,
-														   param->paramid);
-						break;
+			if (IsA(expr, Const))
+				continue;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
-				}
-				gotone = true;
-			}
+			gotnonconst = true;
+			pull_params(expr, pinfo);
 		}
 	}
 
-	return gotone;
+	return gotnonconst;
 }
 
 /*
@@ -3031,38 +3099,62 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
+	if (IsA(expr, Const))
+	{
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
 	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
+		/*
+		 * When being called during planning constant folding the Param's
+		 * value.
+		 */
+		if (context->planstate &&
+			contains_only_safeparams(expr, context->safeparams))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
+
+			/* Exprs with volatile functions shouldn't make it here */
+			Assert(!contain_volatile_functions((Node *) expr));
+
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
+
 			return true;
+		}
+	}
 
-		case T_Param:
+	return false;
+}
 
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+static bool
+contains_only_safeparams(Expr *expr, Bitmapset *safeparams)
+{
+	SafeParamContext context;
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+	context.safeparams = safeparams;
 
-				return true;
-			}
-			break;
+	return contains_only_safeparams_walker((Node *) expr, &context);
+}
 
-		default:
-			break;
+static bool
+contains_only_safeparams_walker(Node *node, SafeParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param *param = (Param *) node;
+		return bms_is_member(param->paramid, context->safeparams);
 	}
+	(void) expression_tree_walker(node, contains_only_safeparams_walker,
+								  (void *) context);
 
-	return false;
+	return true;
 }
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fc6e9574e3..a48cf72d8c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -136,6 +136,8 @@ typedef struct PartitionTupleRouting
  * extparams					Contains paramids of external params found
  *								matching partition keys in 'pruning_steps'.
  * allparams					As 'extparams' but also including exec params.
+ * hasparamlessexprs			Some pruning steps contain Exprs without any
+ *								Params.
  *-----------------------
  */
 typedef struct PartitionPruningData
@@ -147,6 +149,7 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	Bitmapset  *extparams;
 	Bitmapset  *allparams;
+	bool		hasparamlessexprs;
 } PartitionPruningData;
 
 /*-----------------------
@@ -163,6 +166,7 @@ typedef struct PartitionPruningData
  *						partitioned relation. First element contains the
  *						details for the target partitioned table.
  * num_partprunedata	Number of items in 'partprunedata' array.
+ * hasparamlessexprs	Some pruning steps contain Exprs without any Params.
  * prune_context		A memory context which can be used to call the query
  *						planner's partition prune functions.
  * extparams			All PARAM_EXTERN paramids which were found to match a
@@ -177,6 +181,7 @@ typedef struct PartitionPruneState
 {
 	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
+	bool		hasparamlessexprs;
 	MemoryContext prune_context;
 	Bitmapset  *extparams;
 	Bitmapset  *execparams;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b2a1..aae102126f 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1597,6 +1597,8 @@ typedef struct PartitionPruneInfo
 	List	   *pruning_steps;	/* List of PartitionPruneStep */
 	Bitmapset  *present_parts;	/* Indexes of all partitions which subnodes
 								 * are present for. */
+	bool		hasparamlessexprs;	/* True if Exprs exist which don't contain
+									 * any Params */
 	int			nparts;			/* The length of the following two arrays */
 	int		   *subnode_map;	/* subnode index by partition id, or -1 */
 	int		   *subpart_map;	/* subpart index by partition id, or -1 */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index cf331e79c1..64b4e933d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1835,6 +1835,54 @@ fetch backward all from cur;
 (2 rows)
 
 commit;
+begin;
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=1 loops=1)
+   Subplans Removed: 3
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(1))
+(4 rows)
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=4 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part2 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part3 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part4 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+(9 rows)
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part2 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part3 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part4 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+(13 rows)
+
+rollback;
 drop table list_part;
 -- Parallel append
 -- Suppress the number of loops each parallel node runs for.  This is because
@@ -2079,6 +2127,40 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
 (27 rows)
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+                                      explain_parallel_append                                      
+---------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=2 loops=1)
+         Workers Planned: 1
+         Workers Launched: 1
+         ->  Partial Aggregate (actual rows=1 loops=2)
+               ->  Nested Loop (actual rows=0 loops=2)
+                     ->  Parallel Seq Scan on lprt_a a (actual rows=51 loops=N)
+                           Filter: (a = ANY ('{0,0,1}'::integer[]))
+                     ->  Append (actual rows=0 loops=102)
+                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+(27 rows)
+
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
                                       explain_parallel_append                                      
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1464f4dcd9..b6681fa44c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -396,6 +396,22 @@ fetch backward all from cur;
 
 commit;
 
+begin;
+
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+
+rollback;
+
 drop table list_part;
 
 -- Parallel append
@@ -486,6 +502,10 @@ set enable_mergejoin = 0;
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+
 insert into lprt_a values(3),(3);
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
#16Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Rowley (#15)
Re: why partition pruning doesn't work?

2018-06-05 17:07 GMT+02:00 David Rowley <david.rowley@2ndquadrant.com>:

On 5 June 2018 at 22:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Maybe, David (added to cc) has something to say about all this,

especially

whether he considers this a feature and not a bug fix.

Thanks, Amit. I had missed this thread.

Yeah. I admit if I'd thought about this case when I wrote the code,
then I'd have made any non-volatile Expr work, but I didn't :-(

It was pointed out to be a few months ago in a comment in [1]. I
initially thought that this was v12 material, but it seems there are a
few people here that are pretty unhappy about it.

I was going to describe what such a patch should look like here, but
that seemed like about as much work as writing it, so:

Please see the attached patch. I've only just finished with it and
it's not fully done yet as there's still an XXX comment where I've not
quite thought about Exprs with Vars from higher levels. These might
always be converted to Params, so the code might be okay as is, but
I've not checked this yet, hence the comment remains.

I'm slightly leaning towards this being included in v11. Without this
people are forced into hacks like WHERE partkey = (SELECT
stablefunc()); to get pruning working at all. If that SQL remains
after this patch then pruning can only take place during actual
execution. With the attached patch the pruning can take place during
the initialization of the executor, which in cases with many
partitions can be significantly faster, providing actual execution is
short. I'd rather people didn't get into bad habits like that if we
can avoid it.

This is really great

Regards

Pavel

Show quoted text

[1] https://blog.2ndquadrant.com/partition-elimination-postgresql-11/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#15)
1 attachment(s)
Re: why partition pruning doesn't work?

On 6 June 2018 at 03:07, David Rowley <david.rowley@2ndquadrant.com> wrote:

Please see the attached patch. I've only just finished with it and
it's not fully done yet as there's still an XXX comment where I've not
quite thought about Exprs with Vars from higher levels. These might
always be converted to Params, so the code might be okay as is, but
I've not checked this yet, hence the comment remains.

I looked at this again today and decided that the XXX comment could
just be removed. I also changed contains_only_safeparams into
contains_unsafeparams and reversed the condition. I then decided that
I didn't like the way we need to check which params are in the Expr
each time we call partkey_datum_from_expr. It seems better to prepare
this in advance when building the pruning steps. I started work on
that, but soon realised that I'd need to pass a List of Bitmapsets to
the executor. This is a problem as Bitmapset is not a Node type and
cannot be copied with COPY_NODE_FIELD(). Probably this could be
refactored to instead of passing 3 Lists in the PartitionPruneStepOp
we could invent a new node type that just has 3 fields and store a
single List.

Anyway, I didn't do that because I'm not sure what the fate of this
patch is going to be. I'd offer to change things around to add a new
Node type, but I don't really want to work on that now if this is v12
material.

I've attached a cleaned up version of the patch I posted yesterday.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_pruning_for_exprs_v2.patchapplication/octet-stream; name=run-time_pruning_for_exprs_v2.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c83991c93c..f9a812480d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1349,7 +1349,11 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * difference between these that we need to concern ourselves with is the
  * time when the values of the Params are known.  External Param values are
  * known at any time of execution, including executor startup, but exec Param
- * values are only known when the executor is running.
+ * values are only known when the executor is running.  We also support
+ * pruning using any stable expression which does not contain any Vars.
+ * Immutable expressions would have been evaluated to a Const during planning,
+ * so plan-time pruning would have taken care of any pruning.  It's not
+ * possible for pruning to take place using volatile expressions.
  *
  * For external Params we may be able to prune away unneeded partitions
  * during executor startup.  This has the added benefit of not having to
@@ -1418,6 +1422,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 	prunestate->num_partprunedata = list_length(partitionpruneinfo);
 	prunestate->extparams = NULL;
 	prunestate->execparams = NULL;
+	prunestate->hasparamlessexprs = false;
 
 	/*
 	 * Create a sub memory context which we'll use when making calls to the
@@ -1513,6 +1518,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		pprune->pruning_steps = pinfo->pruning_steps;
 		pprune->extparams = bms_copy(pinfo->extparams);
 		pprune->allparams = bms_union(pinfo->extparams, pinfo->execparams);
+		pprune->hasparamlessexprs = pinfo->hasparamlessexprs;
 
 		/*
 		 * Accumulate the paramids which match the partitioned keys of all
@@ -1524,6 +1530,8 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		prunestate->execparams = bms_add_members(prunestate->execparams,
 												 pinfo->execparams);
 
+		prunestate->hasparamlessexprs |= pinfo->hasparamlessexprs;
+
 		relation_close(rel, NoLock);
 
 		i++;
@@ -1566,10 +1574,11 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	Bitmapset  *result = NULL;
 
 	/*
-	 * Ensure there's actually external params, or we've not been called
-	 * already.
+	 * If there are no parameter-less Exprs then ensure there's actually
+	 * external params, or we've not been called already.
 	 */
-	Assert(!bms_is_empty(prunestate->extparams));
+	Assert(prunestate->hasparamlessexprs ||
+		   !bms_is_empty(prunestate->extparams));
 
 	pprune = prunestate->partprunedata;
 
@@ -1739,14 +1748,14 @@ find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 
 	/*
 	 * We only need to determine the matching partitions if there are any
-	 * params matching the partition key at this level.  If there are no
-	 * matching params, then we can simply return all subnodes which belong to
-	 * this parent partition.  The planner should have already determined
-	 * these to be the minimum possible set.  We must still recursively visit
-	 * any subpartitioned tables as we may find their partition keys match
-	 * some Params at their level.
+	 * params matching the partition key at this level, or if there are any
+	 * parameter-less expressions matching the partition key.  However, if
+	 * it's just parameter-less expressions then we only prune during
+	 * ExecFindInitialMatchingSubPlans, there's no point in doing this from
+	 * ExecFindMatchingSubPlans too, once is enough.
 	 */
-	if (!bms_is_empty(pruneparams))
+	if ((!allparams && pprune->hasparamlessexprs) ||
+		!bms_is_empty(pruneparams))
 	{
 		context->safeparams = pruneparams;
 		partset = get_matching_partitions(context,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6bc3e470bf..275254aa65 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -139,10 +139,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 												  node->part_prune_infos);
 
 		/*
-		 * When there are external params matching the partition key we may be
-		 * able to prune away Append subplans now.
+		 * When there are parameter-less exprs or any external params matching
+		 * the partition key we may be able to prune away Append subplans now.
 		 */
-		if (!bms_is_empty(prunestate->extparams))
+		if (prunestate->hasparamlessexprs ||
+			!bms_is_empty(prunestate->extparams))
 		{
 			/* Determine which subplans match the external params */
 			validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7afe..78b737b2b9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2174,6 +2174,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 	COPY_SCALAR_FIELD(reloid);
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
+	COPY_SCALAR_FIELD(hasparamlessexprs);
 	COPY_SCALAR_FIELD(nparts);
 	COPY_POINTER_FIELD(subnode_map, from->nparts * sizeof(int));
 	COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 610f9edaf5..846055d68d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1741,6 +1741,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
 	WRITE_OID_FIELD(reloid);
 	WRITE_NODE_FIELD(pruning_steps);
 	WRITE_BITMAPSET_FIELD(present_parts);
+	WRITE_BOOL_FIELD(hasparamlessexprs);
 	WRITE_INT_FIELD(nparts);
 
 	appendStringInfoString(str, " :subnode_map");
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2826cec2f8..67f30f431e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1362,6 +1362,7 @@ _readPartitionPruneInfo(void)
 	READ_OID_FIELD(reloid);
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
+	READ_BOOL_FIELD(hasparamlessexprs);
 	READ_INT_FIELD(nparts);
 	READ_INT_ARRAY(subnode_map, local_node->nparts);
 	READ_INT_ARRAY(subpart_map, local_node->nparts);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..ebc9e90ed8 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -53,6 +53,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
+#include "optimizer/var.h"
 #include "partitioning/partprune.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
@@ -115,6 +116,24 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
+/*
+ * expression_tree_walker context struct for gathering paramids of params
+ * matching the partition key.
+ */
+typedef struct PullParamContext
+{
+	Bitmapset *extparams;
+	Bitmapset *execparams;
+} PullParamContext;
+
+/*
+ * expression_tree_walker context struct for checking if an Expr contains any
+ * Params not listed in 'safeparams'.
+ */
+typedef struct SafeParamContext
+{
+	Bitmapset *safeparams;
+} SafeParamContext;
 
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
@@ -162,6 +181,8 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
 static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
 						  StrategyNumber opstrategy, Datum *values, int nvalues,
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static void pull_params(Expr *expr, PartitionPruneInfo *pinfo);
+static bool pull_params_walker(Node *node, PullParamContext *context);
 static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
@@ -172,6 +193,9 @@ static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
 							   Expr *partkey, Expr **outconst);
 static bool partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value);
+static bool contains_unsafeparams(Expr *expr, Bitmapset *safeparams);
+static bool contains_unsafeparams_walker(Node *node,
+							 SafeParamContext *context);
 
 /*
  * make_partition_pruneinfo
@@ -197,7 +221,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	int		   *relid_subnode_map;
 	int		   *relid_subpart_map;
 	int			i;
-	bool		gotparam = false;
+	bool		gotnonconst = false;
 
 	/*
 	 * Allocate two arrays to store the 1-based indexes of the 'subpaths' and
@@ -326,11 +350,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		pinfo->subpart_map = subpart_map;
 
 		/*
-		 * Extract Params matching partition key and record if we got any.
-		 * We'll not bother enabling run-time pruning if no params matched the
-		 * partition key at any level of partitioning.
+		 * Extract Params matching partition key and record if any steps
+		 * compare a non-Const value to the partition key.  If everything
+		 * is Const then we've no need to perform run-time pruning as the
+		 * planner will have already selected the minimum set of partitions.
 		 */
-		gotparam |= pull_partkey_params(pinfo, pruning_steps);
+		gotnonconst |= pull_partkey_params(pinfo, pruning_steps);
 
 		pinfolist = lappend(pinfolist, pinfo);
 	}
@@ -338,14 +363,11 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	pfree(relid_subnode_map);
 	pfree(relid_subpart_map);
 
-	if (gotparam)
+	/* Enable pruning if we got any non-Consts */
+	if (gotnonconst)
 		return pinfolist;
 
-	/*
-	 * If no Params were found to match the partition key on any of the
-	 * partitioned relations then there's no point doing any run-time
-	 * partition pruning.
-	 */
+	/* Run-time pruning would be useless */
 	return NIL;
 }
 
@@ -1478,6 +1500,10 @@ match_clause_to_partition_key(RelOptInfo *rel,
 		if (contain_volatile_functions((Node *) expr))
 			return PARTCLAUSE_UNSUPPORTED;
 
+		/* We can't prune using an expression with Vars. */
+		if (contain_var_clause((Node *) expr))
+			return PARTCLAUSE_UNSUPPORTED;
+
 		/*
 		 * Determine the input types of the operator we're considering.
 		 *
@@ -2682,16 +2708,74 @@ get_matching_range_bounds(PartitionPruneContext *context,
 	return result;
 }
 
+/*
+ * pull_params
+ *		Determine all external and exec params in 'expr' and add the found
+ *		paramids to the appropriate 'pinfo' field.  pinfo's hasparamlessexprs
+ *		field is set if any non-Const expression is found which have no
+ *		parameters.
+ */
+static void
+pull_params(Expr *expr, PartitionPruneInfo *pinfo)
+{
+	PullParamContext context;
+
+	context.extparams = NULL;
+	context.execparams = NULL;
+
+	pull_params_walker((Node *) expr, &context);
+
+	pinfo->extparams = bms_add_members(pinfo->extparams, context.extparams);
+	pinfo->execparams = bms_add_members(pinfo->execparams,
+										context.execparams);
+
+	/* Mark that an Expr has been seen which contains no Params */
+	if (!context.extparams && !context.execparams)
+		pinfo->hasparamlessexprs = true;
+}
+
+static bool
+pull_params_walker(Node *node, PullParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		switch (param->paramkind)
+		{
+			case PARAM_EXTERN:
+				context->extparams = bms_add_member(context->extparams,
+													param->paramid);
+				break;
+			case PARAM_EXEC:
+				context->execparams = bms_add_member(context->execparams,
+													 param->paramid);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized paramkind: %d",
+					 (int) param->paramkind);
+				break;
+		}
+	}
+	return expression_tree_walker(node, pull_params_walker,
+								  (void *) context);
+}
+
 /*
  * pull_partkey_params
  *		Loop through each pruning step and record each external and exec
  *		Params being compared to the partition keys.
+ *
+ * Returns true if any non-const value is being compared to the partition key.
  */
 static bool
 pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 {
 	ListCell   *lc;
-	bool		gotone = false;
+	bool		gotnonconst = false;
 
 	foreach(lc, steps)
 	{
@@ -2705,32 +2789,15 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		{
 			Expr	   *expr = lfirst(lc2);
 
-			if (IsA(expr, Param))
-			{
-				Param	   *param = (Param *) expr;
-
-				switch (param->paramkind)
-				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
-						pinfo->execparams = bms_add_member(pinfo->execparams,
-														   param->paramid);
-						break;
+			if (IsA(expr, Const))
+				continue;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
-				}
-				gotone = true;
-			}
+			gotnonconst = true;
+			pull_params(expr, pinfo);
 		}
 	}
 
-	return gotone;
+	return gotnonconst;
 }
 
 /*
@@ -3031,38 +3098,64 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
+	if (IsA(expr, Const))
+	{
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
 	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
+		/*
+		 * When being called during planning constant folding the Param's
+		 * value.
+		 */
+		if (context->planstate &&
+			!contains_unsafeparams(expr, context->safeparams))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
+
+			/* Exprs with volatile functions shouldn't make it here */
+			Assert(!contain_volatile_functions((Node *) expr));
+
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
+
 			return true;
+		}
+	}
+
+	return false;
+}
 
-		case T_Param:
+static bool
+contains_unsafeparams(Expr *expr, Bitmapset *safeparams)
+{
+	SafeParamContext context;
 
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+	context.safeparams = safeparams;
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+	return contains_unsafeparams_walker((Node *) expr, &context);
+}
 
-				return true;
-			}
-			break;
+static bool
+contains_unsafeparams_walker(Node *node, SafeParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param *param = (Param *) node;
 
-		default:
-			break;
+		/* if param is not on the safe list then return true */
+		if (!bms_is_member(param->paramid, context->safeparams))
+			return true;
+		return false;		/* keep looking */
 	}
-
-	return false;
+	return expression_tree_walker(node, contains_unsafeparams_walker,
+								  (void *) context);
 }
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fc6e9574e3..a48cf72d8c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -136,6 +136,8 @@ typedef struct PartitionTupleRouting
  * extparams					Contains paramids of external params found
  *								matching partition keys in 'pruning_steps'.
  * allparams					As 'extparams' but also including exec params.
+ * hasparamlessexprs			Some pruning steps contain Exprs without any
+ *								Params.
  *-----------------------
  */
 typedef struct PartitionPruningData
@@ -147,6 +149,7 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	Bitmapset  *extparams;
 	Bitmapset  *allparams;
+	bool		hasparamlessexprs;
 } PartitionPruningData;
 
 /*-----------------------
@@ -163,6 +166,7 @@ typedef struct PartitionPruningData
  *						partitioned relation. First element contains the
  *						details for the target partitioned table.
  * num_partprunedata	Number of items in 'partprunedata' array.
+ * hasparamlessexprs	Some pruning steps contain Exprs without any Params.
  * prune_context		A memory context which can be used to call the query
  *						planner's partition prune functions.
  * extparams			All PARAM_EXTERN paramids which were found to match a
@@ -177,6 +181,7 @@ typedef struct PartitionPruneState
 {
 	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
+	bool		hasparamlessexprs;
 	MemoryContext prune_context;
 	Bitmapset  *extparams;
 	Bitmapset  *execparams;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b2a1..aae102126f 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1597,6 +1597,8 @@ typedef struct PartitionPruneInfo
 	List	   *pruning_steps;	/* List of PartitionPruneStep */
 	Bitmapset  *present_parts;	/* Indexes of all partitions which subnodes
 								 * are present for. */
+	bool		hasparamlessexprs;	/* True if Exprs exist which don't contain
+									 * any Params */
 	int			nparts;			/* The length of the following two arrays */
 	int		   *subnode_map;	/* subnode index by partition id, or -1 */
 	int		   *subpart_map;	/* subpart index by partition id, or -1 */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index cf331e79c1..64b4e933d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1835,6 +1835,54 @@ fetch backward all from cur;
 (2 rows)
 
 commit;
+begin;
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=1 loops=1)
+   Subplans Removed: 3
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(1))
+(4 rows)
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=4 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part2 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part3 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part4 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+(9 rows)
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part2 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part3 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part4 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+(13 rows)
+
+rollback;
 drop table list_part;
 -- Parallel append
 -- Suppress the number of loops each parallel node runs for.  This is because
@@ -2079,6 +2127,40 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
 (27 rows)
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+                                      explain_parallel_append                                      
+---------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=2 loops=1)
+         Workers Planned: 1
+         Workers Launched: 1
+         ->  Partial Aggregate (actual rows=1 loops=2)
+               ->  Nested Loop (actual rows=0 loops=2)
+                     ->  Parallel Seq Scan on lprt_a a (actual rows=51 loops=N)
+                           Filter: (a = ANY ('{0,0,1}'::integer[]))
+                     ->  Append (actual rows=0 loops=102)
+                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+(27 rows)
+
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
                                       explain_parallel_append                                      
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1464f4dcd9..b6681fa44c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -396,6 +396,22 @@ fetch backward all from cur;
 
 commit;
 
+begin;
+
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+
+rollback;
+
 drop table list_part;
 
 -- Parallel append
@@ -486,6 +502,10 @@ set enable_mergejoin = 0;
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+
 insert into lprt_a values(3),(3);
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
#18Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#17)
Re: why partition pruning doesn't work?

On 2018/06/06 14:10, David Rowley wrote:

I then decided that
I didn't like the way we need to check which params are in the Expr
each time we call partkey_datum_from_expr. It seems better to prepare
this in advance when building the pruning steps. I started work on
that, but soon realised that I'd need to pass a List of Bitmapsets to
the executor. This is a problem as Bitmapset is not a Node type and
cannot be copied with COPY_NODE_FIELD(). Probably this could be
refactored to instead of passing 3 Lists in the PartitionPruneStepOp
we could invent a new node type that just has 3 fields and store a
single List.

I wonder why we need to create those Bitmapsets in the planner? Why not
in ExecSetupPartitionPruneState()? For example, like how
context->exprstates is initialized.

Thanks,
Amit

#19David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#18)
1 attachment(s)
Re: why partition pruning doesn't work?

On 6 June 2018 at 18:05, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/06/06 14:10, David Rowley wrote:

I then decided that
I didn't like the way we need to check which params are in the Expr
each time we call partkey_datum_from_expr. It seems better to prepare
this in advance when building the pruning steps. I started work on
that, but soon realised that I'd need to pass a List of Bitmapsets to
the executor. This is a problem as Bitmapset is not a Node type and
cannot be copied with COPY_NODE_FIELD(). Probably this could be
refactored to instead of passing 3 Lists in the PartitionPruneStepOp
we could invent a new node type that just has 3 fields and store a
single List.

I wonder why we need to create those Bitmapsets in the planner? Why not
in ExecSetupPartitionPruneState()? For example, like how
context->exprstates is initialized.

That seems like a good idea. Certainly much better than working them
out each time we prune.

v3 patch attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_pruning_for_exprs_v3.patchapplication/octet-stream; name=run-time_pruning_for_exprs_v3.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c83991c93c..2055b81d4f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1349,7 +1349,11 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * difference between these that we need to concern ourselves with is the
  * time when the values of the Params are known.  External Param values are
  * known at any time of execution, including executor startup, but exec Param
- * values are only known when the executor is running.
+ * values are only known when the executor is running.  We also support
+ * pruning using any stable expression which does not contain any Vars.
+ * Immutable expressions would have been evaluated to a Const during planning,
+ * so plan-time pruning would have taken care of any pruning.  It's not
+ * possible for pruning to take place using volatile expressions.
  *
  * For external Params we may be able to prune away unneeded partitions
  * during executor startup.  This has the added benefit of not having to
@@ -1418,6 +1422,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 	prunestate->num_partprunedata = list_length(partitionpruneinfo);
 	prunestate->extparams = NULL;
 	prunestate->execparams = NULL;
+	prunestate->hasparamlessexprs = false;
 
 	/*
 	 * Create a sub memory context which we'll use when making calls to the
@@ -1456,6 +1461,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 
 		/* We can use the subpart_map verbatim, since we never modify it */
 		pprune->subpart_map = pinfo->subpart_map;
+		pprune->hasparamlessexprs = false;
 
 		/*
 		 * Grab some info from the table's relcache; lock was already obtained
@@ -1478,6 +1484,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		context->planstate = planstate;
 		context->safeparams = NULL; /* empty for now */
 		context->exprstates = palloc0(sizeof(ExprState *) * n_steps * partnatts);
+		context->exprparamids = palloc0(sizeof(Bitmapset *) * n_steps * partnatts);
 
 		/* Initialize expression states for each expression */
 		foreach(lc2, pinfo->pruning_steps)
@@ -1505,6 +1512,11 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 												step->step.step_id, keyno);
 					context->exprstates[stateidx] =
 						ExecInitExpr(expr, context->planstate);
+					context->exprparamids[stateidx] = pull_paramids(expr);
+
+					if (!pprune->hasparamlessexprs &&
+						bms_is_empty(context->exprparamids[stateidx]))
+						pprune->hasparamlessexprs = true;
 				}
 				keyno++;
 			}
@@ -1524,6 +1536,8 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		prunestate->execparams = bms_add_members(prunestate->execparams,
 												 pinfo->execparams);
 
+		prunestate->hasparamlessexprs |= pprune->hasparamlessexprs;
+
 		relation_close(rel, NoLock);
 
 		i++;
@@ -1566,10 +1580,11 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	Bitmapset  *result = NULL;
 
 	/*
-	 * Ensure there's actually external params, or we've not been called
-	 * already.
+	 * If there are no parameter-less Exprs then ensure there's actually
+	 * external params, or we've not been called already.
 	 */
-	Assert(!bms_is_empty(prunestate->extparams));
+	Assert(prunestate->hasparamlessexprs ||
+		   !bms_is_empty(prunestate->extparams));
 
 	pprune = prunestate->partprunedata;
 
@@ -1739,14 +1754,14 @@ find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 
 	/*
 	 * We only need to determine the matching partitions if there are any
-	 * params matching the partition key at this level.  If there are no
-	 * matching params, then we can simply return all subnodes which belong to
-	 * this parent partition.  The planner should have already determined
-	 * these to be the minimum possible set.  We must still recursively visit
-	 * any subpartitioned tables as we may find their partition keys match
-	 * some Params at their level.
+	 * params matching the partition key at this level, or if there are any
+	 * parameter-less expressions matching the partition key.  However, if
+	 * it's just parameter-less expressions then we only prune during
+	 * ExecFindInitialMatchingSubPlans, there's no point in doing this from
+	 * ExecFindMatchingSubPlans too, once is enough.
 	 */
-	if (!bms_is_empty(pruneparams))
+	if ((!allparams && pprune->hasparamlessexprs) ||
+		!bms_is_empty(pruneparams))
 	{
 		context->safeparams = pruneparams;
 		partset = get_matching_partitions(context,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6bc3e470bf..275254aa65 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -139,10 +139,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 												  node->part_prune_infos);
 
 		/*
-		 * When there are external params matching the partition key we may be
-		 * able to prune away Append subplans now.
+		 * When there are parameter-less exprs or any external params matching
+		 * the partition key we may be able to prune away Append subplans now.
 		 */
-		if (!bms_is_empty(prunestate->extparams))
+		if (prunestate->hasparamlessexprs ||
+			!bms_is_empty(prunestate->extparams))
 		{
 			/* Determine which subplans match the external params */
 			validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..a78013bf86 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -53,6 +53,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
+#include "optimizer/var.h"
 #include "partitioning/partprune.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
@@ -115,6 +116,15 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
+/*
+ * expression_tree_walker context struct for gathering paramids of params
+ * matching the partition key.
+ */
+typedef struct PullParamContext
+{
+	Bitmapset *extparams;
+	Bitmapset *execparams;
+} PullParamContext;
 
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
@@ -162,6 +172,8 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
 static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
 						  StrategyNumber opstrategy, Datum *values, int nvalues,
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static void pull_params(Expr *expr, PartitionPruneInfo *pinfo);
+static bool pull_params_walker(Node *node, PullParamContext *context);
 static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
@@ -197,7 +209,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	int		   *relid_subnode_map;
 	int		   *relid_subpart_map;
 	int			i;
-	bool		gotparam = false;
+	bool		gotnonconst = false;
 
 	/*
 	 * Allocate two arrays to store the 1-based indexes of the 'subpaths' and
@@ -326,11 +338,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		pinfo->subpart_map = subpart_map;
 
 		/*
-		 * Extract Params matching partition key and record if we got any.
-		 * We'll not bother enabling run-time pruning if no params matched the
-		 * partition key at any level of partitioning.
+		 * Extract Params matching partition key and record if any steps
+		 * compare a non-Const value to the partition key.  If everything
+		 * is Const then we've no need to perform run-time pruning as the
+		 * planner will have already selected the minimum set of partitions.
 		 */
-		gotparam |= pull_partkey_params(pinfo, pruning_steps);
+		gotnonconst |= pull_partkey_params(pinfo, pruning_steps);
 
 		pinfolist = lappend(pinfolist, pinfo);
 	}
@@ -338,14 +351,11 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	pfree(relid_subnode_map);
 	pfree(relid_subpart_map);
 
-	if (gotparam)
+	/* Enable pruning if we got any non-Consts */
+	if (gotnonconst)
 		return pinfolist;
 
-	/*
-	 * If no Params were found to match the partition key on any of the
-	 * partitioned relations then there's no point doing any run-time
-	 * partition pruning.
-	 */
+	/* Run-time pruning would be useless */
 	return NIL;
 }
 
@@ -447,6 +457,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
 	context.planstate = NULL;
 	context.safeparams = NULL;
 	context.exprstates = NULL;
+	context.exprparamids = NULL;
 
 	/* Actual pruning happens here. */
 	partindexes = get_matching_partitions(&context, pruning_steps);
@@ -1478,6 +1489,10 @@ match_clause_to_partition_key(RelOptInfo *rel,
 		if (contain_volatile_functions((Node *) expr))
 			return PARTCLAUSE_UNSUPPORTED;
 
+		/* We can't prune using an expression with Vars. */
+		if (contain_var_clause((Node *) expr))
+			return PARTCLAUSE_UNSUPPORTED;
+
 		/*
 		 * Determine the input types of the operator we're considering.
 		 *
@@ -2682,16 +2697,85 @@ get_matching_range_bounds(PartitionPruneContext *context,
 	return result;
 }
 
+/*
+ * pull_paramids
+ *		Returns a Bitmapset containing the paramids of each Param in 'expr'.
+ */
+Bitmapset *
+pull_paramids(Expr *expr)
+{
+	PullParamContext context;
+
+	context.extparams = NULL;
+	context.execparams = NULL;
+
+	pull_params_walker((Node *) expr, &context);
+
+	return bms_union(context.extparams, context.execparams);
+}
+
+/*
+ * pull_params
+ *		Determine all external and exec params in 'expr' and add the found
+ *		paramids to the appropriate 'pinfo' field.
+ */
+static void
+pull_params(Expr *expr, PartitionPruneInfo *pinfo)
+{
+	PullParamContext context;
+
+	context.extparams = NULL;
+	context.execparams = NULL;
+
+	pull_params_walker((Node *) expr, &context);
+
+	pinfo->extparams = bms_add_members(pinfo->extparams, context.extparams);
+	pinfo->execparams = bms_add_members(pinfo->execparams,
+										context.execparams);
+}
+
+static bool
+pull_params_walker(Node *node, PullParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		switch (param->paramkind)
+		{
+			case PARAM_EXTERN:
+				context->extparams = bms_add_member(context->extparams,
+													param->paramid);
+				break;
+			case PARAM_EXEC:
+				context->execparams = bms_add_member(context->execparams,
+													 param->paramid);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized paramkind: %d",
+					 (int) param->paramkind);
+				break;
+		}
+	}
+	return expression_tree_walker(node, pull_params_walker,
+								  (void *) context);
+}
+
 /*
  * pull_partkey_params
  *		Loop through each pruning step and record each external and exec
  *		Params being compared to the partition keys.
+ *
+ * Returns true if any non-const value is being compared to the partition key.
  */
 static bool
 pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 {
 	ListCell   *lc;
-	bool		gotone = false;
+	bool		gotnonconst = false;
 
 	foreach(lc, steps)
 	{
@@ -2705,32 +2789,15 @@ pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
 		{
 			Expr	   *expr = lfirst(lc2);
 
-			if (IsA(expr, Param))
-			{
-				Param	   *param = (Param *) expr;
-
-				switch (param->paramkind)
-				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
-						pinfo->execparams = bms_add_member(pinfo->execparams,
-														   param->paramid);
-						break;
+			if (IsA(expr, Const))
+				continue;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
-				}
-				gotone = true;
-			}
+			gotnonconst = true;
+			pull_params(expr, pinfo);
 		}
 	}
 
-	return gotone;
+	return gotnonconst;
 }
 
 /*
@@ -3031,37 +3098,38 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
+	if (IsA(expr, Const))
 	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
-			return true;
-
-		case T_Param:
-
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
+	{
+		/*
+		 * When called from the executor we'll have a valid planstate so we
+		 * may be able to evaluate the expression.  However, we must ensure
+		 * we don't try to do this if the expression contains parameters which
+		 * we're unable to evaluate at this time.
+		 */
+		if (context->planstate &&
+			!bms_nonempty_difference(context->exprparamids[stateidx],
+									 context->safeparams))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+			/* Exprs with volatile functions shouldn't make it here */
+			Assert(!contain_volatile_functions((Node *) expr));
 
-				return true;
-			}
-			break;
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
 
-		default:
-			break;
+			return true;
+		}
 	}
 
 	return false;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fc6e9574e3..a48cf72d8c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -136,6 +136,8 @@ typedef struct PartitionTupleRouting
  * extparams					Contains paramids of external params found
  *								matching partition keys in 'pruning_steps'.
  * allparams					As 'extparams' but also including exec params.
+ * hasparamlessexprs			Some pruning steps contain Exprs without any
+ *								Params.
  *-----------------------
  */
 typedef struct PartitionPruningData
@@ -147,6 +149,7 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	Bitmapset  *extparams;
 	Bitmapset  *allparams;
+	bool		hasparamlessexprs;
 } PartitionPruningData;
 
 /*-----------------------
@@ -163,6 +166,7 @@ typedef struct PartitionPruningData
  *						partitioned relation. First element contains the
  *						details for the target partitioned table.
  * num_partprunedata	Number of items in 'partprunedata' array.
+ * hasparamlessexprs	Some pruning steps contain Exprs without any Params.
  * prune_context		A memory context which can be used to call the query
  *						planner's partition prune functions.
  * extparams			All PARAM_EXTERN paramids which were found to match a
@@ -177,6 +181,7 @@ typedef struct PartitionPruneState
 {
 	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
+	bool		hasparamlessexprs;
 	MemoryContext prune_context;
 	Bitmapset  *extparams;
 	Bitmapset  *execparams;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 3d114b4c71..14fc9aea9c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -57,6 +57,9 @@ typedef struct PartitionPruneContext
 	 * otherwise NULL.
 	 */
 	ExprState **exprstates;
+
+	/* Array of Bitmapsets, one for each exprstates */
+	Bitmapset **exprparamids;
 } PartitionPruneContext;
 
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
@@ -67,5 +70,6 @@ extern List *make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
 						List *pruning_steps);
+extern Bitmapset *pull_paramids(Expr *expr);
 
 #endif							/* PARTPRUNE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index cf331e79c1..64b4e933d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1835,6 +1835,54 @@ fetch backward all from cur;
 (2 rows)
 
 commit;
+begin;
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=1 loops=1)
+   Subplans Removed: 3
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(1))
+(4 rows)
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=4 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part2 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part3 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part4 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+(9 rows)
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part2 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part3 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part4 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+(13 rows)
+
+rollback;
 drop table list_part;
 -- Parallel append
 -- Suppress the number of loops each parallel node runs for.  This is because
@@ -2079,6 +2127,40 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
 (27 rows)
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+                                      explain_parallel_append                                      
+---------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=2 loops=1)
+         Workers Planned: 1
+         Workers Launched: 1
+         ->  Partial Aggregate (actual rows=1 loops=2)
+               ->  Nested Loop (actual rows=0 loops=2)
+                     ->  Parallel Seq Scan on lprt_a a (actual rows=51 loops=N)
+                           Filter: (a = ANY ('{0,0,1}'::integer[]))
+                     ->  Append (actual rows=0 loops=102)
+                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+(27 rows)
+
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
                                       explain_parallel_append                                      
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1464f4dcd9..b6681fa44c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -396,6 +396,22 @@ fetch backward all from cur;
 
 commit;
 
+begin;
+
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+
+rollback;
+
 drop table list_part;
 
 -- Parallel append
@@ -486,6 +502,10 @@ set enable_mergejoin = 0;
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+
 insert into lprt_a values(3),(3);
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#19)
Re: why partition pruning doesn't work?

On 2018/06/06 18:52, David Rowley wrote:

On 6 June 2018 at 18:05, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/06/06 14:10, David Rowley wrote:

I then decided that
I didn't like the way we need to check which params are in the Expr
each time we call partkey_datum_from_expr. It seems better to prepare
this in advance when building the pruning steps. I started work on
that, but soon realised that I'd need to pass a List of Bitmapsets to
the executor. This is a problem as Bitmapset is not a Node type and
cannot be copied with COPY_NODE_FIELD(). Probably this could be
refactored to instead of passing 3 Lists in the PartitionPruneStepOp
we could invent a new node type that just has 3 fields and store a
single List.

I wonder why we need to create those Bitmapsets in the planner? Why not
in ExecSetupPartitionPruneState()? For example, like how
context->exprstates is initialized.

That seems like a good idea. Certainly much better than working them
out each time we prune.

v3 patch attached.

Thanks David. This one looks good. I also like it that hasparamlessexprs
is no longer determined and set in the planner.

I checked what happens with the cases that Ashutosh complained about
upthread and seems that the pruning works as expected.

create table t1 (a int, b int) partition by range (a);
create table t1p1 partition of t1 for values from (0) to (100);
create table t1p2 partition of t1 for values from (100) to (200);
create index on t1 (a);
insert into t1 select i, i from generate_series(0, 199) i;

explain (costs off, analyze) select * from t1 x left join t1 y on x.a =
y.b + 100 where y.a = 5;
QUERY PLAN

-----------------------------------------------------------------------------------------------
Nested Loop (actual time=0.294..0.371 rows=1 loops=1)
-> Append (actual time=0.067..0.092 rows=1 loops=1)
-> Bitmap Heap Scan on t1p1 y (actual time=0.049..0.059 rows=1
loops=1)
Recheck Cond: (a = 5)
Heap Blocks: exact=1
-> Bitmap Index Scan on t1p1_a_idx (actual
time=0.022..0.022 rows=1 loops=1)
Index Cond: (a = 5)
-> Append (actual time=0.192..0.219 rows=1 loops=1)
-> Index Scan using t1p1_a_idx on t1p1 x (never executed)
Index Cond: (a = (y.b + 100))
-> Index Scan using t1p2_a_idx on t1p2 x_1 (actual
time=0.134..0.145 rows=1 loops=1)
Index Cond: (a = (y.b + 100))
Planning Time: 5.314 ms
Execution Time: 0.938 ms
(14 rows)

Note that the condition x.a = y.b + 100 is able to prune t1p1, whereas on
HEAD it isn't.

Thanks,
Amit

#21David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#20)
Re: why partition pruning doesn't work?

On 7 June 2018 at 14:51, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Thanks David. This one looks good. I also like it that hasparamlessexprs
is no longer determined and set in the planner.

Thanks for checking it.

I checked what happens with the cases that Ashutosh complained about
upthread and seems that the pruning works as expected.

[...]

explain (costs off, analyze) select * from t1 x left join t1 y on x.a =
y.b + 100 where y.a = 5;

Yeah, I added a test to partition_prune.sql that verifies a similar case.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#22Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: David Rowley (#21)
Re: why partition pruning doesn't work?

On Thu, Jun 7, 2018 at 8:51 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 7 June 2018 at 14:51, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Thanks David. This one looks good. I also like it that hasparamlessexprs
is no longer determined and set in the planner.

Thanks for checking it.

I checked what happens with the cases that Ashutosh complained about
upthread and seems that the pruning works as expected.

[...]

explain (costs off, analyze) select * from t1 x left join t1 y on x.a =
y.b + 100 where y.a = 5;

Yeah, I added a test to partition_prune.sql that verifies a similar case.

Thanks for taking care of that.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#19)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 6 June 2018 at 18:05, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I wonder why we need to create those Bitmapsets in the planner? Why not
in ExecSetupPartitionPruneState()? For example, like how
context->exprstates is initialized.

That seems like a good idea. Certainly much better than working them
out each time we prune.

v3 patch attached.

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

IOW, I was hoping for the code to end up simpler, not more complicated.

regards, tom lane

#24David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#23)
Re: why partition pruning doesn't work?

On 8 June 2018 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

IOW, I was hoping for the code to end up simpler, not more complicated.

We need to know which Params exist in the Expr as if there are no
Params, or only external Params, then we can run-time prune during
startup of the executor. Otherwise, we must leave the pruning until
during execution.

I really don't want to say goodbye to that optimisation as it's a
significant win to save having to initialise the subnodes for all the
useless partitions for OLTP type queries.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#24)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 June 2018 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

We need to know which Params exist in the Expr as if there are no
Params, or only external Params, then we can run-time prune during
startup of the executor.

This does not refute my question. Why doesn't the same logic apply
to any stable expression? That is, ISTM a Param is a special case
of that.

regards, tom lane

#26Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#25)
Re: why partition pruning doesn't work?

On Fri, Jun 8, 2018 at 8:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 June 2018 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

We need to know which Params exist in the Expr as if there are no
Params, or only external Params, then we can run-time prune during
startup of the executor.

This does not refute my question. Why doesn't the same logic apply
to any stable expression? That is, ISTM a Param is a special case
of that.

+1.

I don't think we need to perform pruning at the start of execution,
but we could fold all the stable expressions to constants at that
time. The PARAM_EXECs can not be folded into constant at execution
start since those not assigned any values yet. AFAIU expressions,
within a given node, with those parameters can be folded into
constants (if possible) during ExecRewind() on that node. We have to
perform pruning just before the (Merge)Append node scan starts.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#27David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#25)
Re: why partition pruning doesn't work?

On 8 June 2018 at 15:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 June 2018 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

We need to know which Params exist in the Expr as if there are no
Params, or only external Params, then we can run-time prune during
startup of the executor.

This does not refute my question. Why doesn't the same logic apply
to any stable expression? That is, ISTM a Param is a special case
of that.

Okay, maybe we don't need to know which external params exist, but we
need to know if there are any exec params so that we don't try to
evaluate an expression with any of those during executor startup.

I'll produce a patch which simplifies things in that area.

Thanks for looking at this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#27)
1 attachment(s)
Re: why partition pruning doesn't work?

On 8 June 2018 at 18:14, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 8 June 2018 at 15:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 June 2018 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe there's something I'm missing here, but I sort of hoped that this
patch would nuke all the special-case code for Params in this area.
Why is there any need to distinguish them from other stable expressions?

We need to know which Params exist in the Expr as if there are no
Params, or only external Params, then we can run-time prune during
startup of the executor.

This does not refute my question. Why doesn't the same logic apply
to any stable expression? That is, ISTM a Param is a special case
of that.

Okay, maybe we don't need to know which external params exist, but we
need to know if there are any exec params so that we don't try to
evaluate an expression with any of those during executor startup.

I'll produce a patch which simplifies things in that area.

Okay, I've gotten rid of the tracking of external params. We now just
track exec params. We still need to know about these so we know if a
re-prune is required during ExecReScanAppend(). Obviously, we don't
want to prune on any random Param change, so I'm fairly sure it's a
good idea to keep track of these.

I've changed the code inside partkey_datum_from_expr so that it's a
simple bool array lookup to decide if we can evaluate the expression
or not. This bool array is populated during planning, which I think is
rather nice so we don't have to go and do it each time the plan is
executed.

I also discovered that I was needlessly running the pruning code again
during executor run in some cases where there was no possibility of
doing any further pruning there. I've had to add some new code to set
the present_parts inside ExecFindInitialMatchingSubPlans(). It now
properly removes the member of any sub-partitions which have had all
of their partitions pruned. This allows us just to use 'present_parts'
to calculate the subnodes from, rather than going and calling the
pruning code again.

Technically PartitionPruningData does not really need the
do_exec_prune field. A non-empty execparams could indicate this, but I
felt it was better to have the bool so that we have one for each
method of run-time pruning. This also saves a bms_is_empty() call
inside find_subplans_for_params_recurse(). This could be a bit of a
hotspot during parameterized nested loops which cause partition
pruning.

I'm really hoping this is what you meant about the special-case code for Params.

Does this look any better?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_pruning_for_exprs_v4.patchapplication/octet-stream; name=run-time_pruning_for_exprs_v4.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c83991c93c..92b74077e9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -50,7 +50,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
 static void find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 								 PartitionPruningData *pprune,
-								 bool allparams,
+								 bool initial_prune,
 								 Bitmapset **validsubplans);
 
 
@@ -1345,18 +1345,16 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * pruning to be performed for values which are only determined during
  * execution, we must make an additional pruning attempt during execution.
  *
- * Here we support pruning using both external and exec Params.  The main
- * difference between these that we need to concern ourselves with is the
- * time when the values of the Params are known.  External Param values are
- * known at any time of execution, including executor startup, but exec Param
- * values are only known when the executor is running.
+ * Here we support pruning using any non-volatile expression. This allows
+ * pruning to be performed using both external and exec Params, along with
+ * stable functions calls and expressions.  Any expression containing an exec
+ * Param must not be evaluated until during execution. For everything else, we
+ * can perform pruning during executor startup.
  *
- * For external Params we may be able to prune away unneeded partitions
- * during executor startup.  This has the added benefit of not having to
- * initialize the unneeded subnodes at all.  This is useful as it can save
- * quite a bit of effort during executor startup.
+ * Having the ability to prune away unneeded subnodes during executor startup
+ * has the added benefit of not having to initialize the unneeded subnodes at
+ * all.
  *
- * For exec Params, we must delay pruning until the executor is running.
  *
  * Functions:
  *
@@ -1369,19 +1367,20 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *		planner's partition prune function into subnode indexes.
  *
  * ExecFindInitialMatchingSubPlans:
- *		Returns indexes of matching subnodes utilizing only external Params
- *		to eliminate subnodes.  The function must only be called during
- *		executor startup for the given node before the subnodes themselves
- *		are initialized.  Subnodes which are found not to match by this
- *		function must not be included in the node's list of subnodes as this
- *		function performs a remap of the partition index to subplan index map
- *		and the newly created map provides indexes only for subnodes which
- *		remain after calling this function.
+ *		Returns indexes of matching subnodes.  Partition pruning is attempted
+ *		without any evaluation of expressions containing exec Params.  This
+ *		function must only be called during executor startup for the given
+ *		node before the subnodes themselves are initialized.  Subnodes which
+ *		are found not to match by this function must not be included in the
+ *		node's list of subnodes as this function performs a remap of the
+ *		partition index to subplan index map and the newly created map
+ *		provides indexes only for subnodes which remain after calling this
+ *		function.
  *
  * ExecFindMatchingSubPlans:
- *		Returns indexes of matching subnodes utilizing all Params to eliminate
- *		subnodes which can't possibly contain matching tuples.  This function
- *		can only be called while the executor is running.
+ *		Returns indexes of matching subnodes evaluating all possible
+ *		expressions.  This function can only be called while the executor is
+ *		running.
  *-------------------------------------------------------------------------
  */
 
@@ -1416,8 +1415,8 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 	 */
 	prunestate->partprunedata = prunedata;
 	prunestate->num_partprunedata = list_length(partitionpruneinfo);
-	prunestate->extparams = NULL;
 	prunestate->execparams = NULL;
+	prunestate->do_initial_prune = false;	/* may be set below */
 
 	/*
 	 * Create a sub memory context which we'll use when making calls to the
@@ -1444,7 +1443,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		int			partnatts;
 		int			n_steps;
 
-		pprune->present_parts = bms_copy(pinfo->present_parts);
+		pprune->present_parts = pinfo->present_parts;
 		pprune->subnode_map = palloc(sizeof(int) * pinfo->nparts);
 
 		/*
@@ -1476,9 +1475,14 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		context->nparts = pinfo->nparts;
 		context->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
 		context->planstate = planstate;
-		context->safeparams = NULL; /* empty for now */
 		context->exprstates = palloc0(sizeof(ExprState *) * n_steps * partnatts);
 
+		/*
+		 * Use the hasexecparam. This is not modified anywhere, so we just
+		 * borrow the planner's copy.
+		 */
+		context->exprhasexecparam = pinfo->hasexecparam;
+
 		/* Initialize expression states for each expression */
 		foreach(lc2, pinfo->pruning_steps)
 		{
@@ -1511,31 +1515,25 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		}
 
 		pprune->pruning_steps = pinfo->pruning_steps;
-		pprune->extparams = bms_copy(pinfo->extparams);
-		pprune->allparams = bms_union(pinfo->extparams, pinfo->execparams);
+		pprune->execparams = bms_copy(pinfo->execparams);
+		pprune->do_initial_prune = pinfo->do_initial_prune;
+		pprune->do_exec_prune = pinfo->do_exec_prune;
 
 		/*
-		 * Accumulate the paramids which match the partitioned keys of all
-		 * partitioned tables.
+		 * Accumulate the exec paramids which match the partitioned keys of
+		 * all partitioned tables.
 		 */
-		prunestate->extparams = bms_add_members(prunestate->extparams,
-												pinfo->extparams);
-
 		prunestate->execparams = bms_add_members(prunestate->execparams,
 												 pinfo->execparams);
 
+		/* Record if an initial prune would be useful at any level */
+		prunestate->do_initial_prune |= pinfo->do_initial_prune;
+
 		relation_close(rel, NoLock);
 
 		i++;
 	}
 
-	/*
-	 * Cache the union of the paramids of both types.  This saves having to
-	 * recalculate it everytime we need to know what they are.
-	 */
-	prunestate->allparams = bms_union(prunestate->extparams,
-									  prunestate->execparams);
-
 	return prunestate;
 }
 
@@ -1543,9 +1541,9 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
  * ExecFindInitialMatchingSubPlans
  *		Determine which subset of subplan nodes we need to initialize based
  *		on the details stored in 'prunestate'.  Here we only determine the
- *		matching partitions using values known during plan startup, which is
- *		only external Params.  Exec Params will be unknown at this time.  We
- *		must delay pruning using exec Params until the actual executor run.
+ *		matching partitions using values known during plan startup, which
+ *		excludes attempting to evaulate any expressions containing exec
+ *		Params.
  *
  * It is expected that callers of this function do so only once during their
  * init plan.  The caller must only initialize the subnodes which are returned
@@ -1554,8 +1552,6 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
  * return its matching subnode indexes assuming that the caller discarded
  * the original non-matching subnodes.
  *
- * This function must only be called if 'prunestate' has any extparams.
- *
  * 'nsubnodes' must be passed as the total number of unpruned subnodes.
  */
 Bitmapset *
@@ -1565,11 +1561,7 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
 
-	/*
-	 * Ensure there's actually external params, or we've not been called
-	 * already.
-	 */
-	Assert(!bms_is_empty(prunestate->extparams));
+	Assert(prunestate->do_initial_prune);
 
 	pprune = prunestate->partprunedata;
 
@@ -1579,8 +1571,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	/* Determine which subnodes match the external params */
-	find_subplans_for_params_recurse(prunestate, pprune, false, &result);
+	/* Perform pruning without using exec params */
+	find_subplans_for_params_recurse(prunestate, pprune, true, &result);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1589,18 +1581,6 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 
 	MemoryContextReset(prunestate->prune_context);
 
-	/*
-	 * Record that partition pruning has been performed for external params.
-	 * These are not required again afterwards, and nullifying them helps
-	 * ensure nothing accidentally calls this function twice on the same
-	 * PartitionPruneState.
-	 *
-	 * (Note we keep prunestate->allparams, because we do use that one
-	 * repeatedly in ExecFindMatchingSubPlans).
-	 */
-	bms_free(prunestate->extparams);
-	prunestate->extparams = NULL;
-
 	/*
 	 * If any subnodes were pruned, we must re-sequence the subnode indexes so
 	 * that ExecFindMatchingSubPlans properly returns the indexes from the
@@ -1644,7 +1624,6 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 			 * for.  It seems easier to build a fresh one, rather than trying
 			 * to update the existing one.
 			 */
-			bms_free(pprune->present_parts);
 			pprune->present_parts = NULL;
 
 			for (j = 0; j < nparts; j++)
@@ -1669,6 +1648,41 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 			}
 		}
 
+		/*
+		 * Now we must determine which sub-partitioned tables still have
+		 * unpruned partitions.  The easiest way to do this is to simply loop
+		 * over each PartitionPruningData again checking if there are any
+		 * 'present_parts' in the sub-partitioned table.  We needn't bother
+		 * doing this if there are no sub-partitioned tables.
+		 */
+		if (prunestate->num_partprunedata > 1)
+		{
+			for (i = 0; i < prunestate->num_partprunedata; i++)
+			{
+				int			nparts;
+				int			j;
+
+				pprune = &prunestate->partprunedata[i];
+				nparts = pprune->context.nparts;
+
+				for (j = 0; j < nparts; j++)
+				{
+					int			subidx = pprune->subpart_map[j];
+
+					if (subidx >= 0)
+					{
+						PartitionPruningData *subprune;
+
+						subprune = &prunestate->partprunedata[subidx];
+
+						if (!bms_is_empty(subprune->present_parts))
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, j);
+					}
+				}
+			}
+		}
+
 		pfree(new_subnode_indexes);
 	}
 
@@ -1697,7 +1711,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	find_subplans_for_params_recurse(prunestate, pprune, true, &result);
+	find_subplans_for_params_recurse(prunestate, pprune, false, &result);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1717,43 +1731,34 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 static void
 find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 								 PartitionPruningData *pprune,
-								 bool allparams,
+								 bool initial_prune,
 								 Bitmapset **validsubplans)
 {
 	PartitionPruneContext *context = &pprune->context;
 	Bitmapset  *partset;
-	Bitmapset  *pruneparams;
 	int			i;
 
 	/* Guard against stack overflow due to overly deep partition hierarchy. */
 	check_stack_depth();
 
-	/*
-	 * Use only external params unless we've been asked to also use exec
-	 * params too.
-	 */
-	if (allparams)
-		pruneparams = pprune->allparams;
-	else
-		pruneparams = pprune->extparams;
-
-	/*
-	 * We only need to determine the matching partitions if there are any
-	 * params matching the partition key at this level.  If there are no
-	 * matching params, then we can simply return all subnodes which belong to
-	 * this parent partition.  The planner should have already determined
-	 * these to be the minimum possible set.  We must still recursively visit
-	 * any subpartitioned tables as we may find their partition keys match
-	 * some Params at their level.
-	 */
-	if (!bms_is_empty(pruneparams))
+	/* Only prune if pruning would be useful at this level. */
+	if ((initial_prune && pprune->do_initial_prune) ||
+		(!initial_prune && pprune->do_exec_prune))
 	{
-		context->safeparams = pruneparams;
+		/* Set whether we can evaluate exec params or not */
+		context->evalexecparams = !initial_prune;
+
 		partset = get_matching_partitions(context,
 										  pprune->pruning_steps);
 	}
 	else
+	{
+		/*
+		 * If no pruning is to be done, just include all partitions at this
+		 * level.
+		 */
 		partset = pprune->present_parts;
+	}
 
 	/* Translate partset into subnode indexes */
 	i = -1;
@@ -1769,7 +1774,7 @@ find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 			if (partidx != -1)
 				find_subplans_for_params_recurse(prunestate,
 												 &prunestate->partprunedata[partidx],
-												 allparams, validsubplans);
+												 initial_prune, validsubplans);
 			else
 			{
 				/*
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6bc3e470bf..707a3e0e4b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,11 +138,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 		prunestate = ExecSetupPartitionPruneState(&appendstate->ps,
 												  node->part_prune_infos);
 
-		/*
-		 * When there are external params matching the partition key we may be
-		 * able to prune away Append subplans now.
-		 */
-		if (!bms_is_empty(prunestate->extparams))
+		/* Perform an initial partition prune, if required. */
+		if (prunestate->do_initial_prune)
 		{
 			/* Determine which subplans match the external params */
 			validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7afe..90c8d4a028 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2175,9 +2175,12 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
 	COPY_SCALAR_FIELD(nparts);
+	COPY_SCALAR_FIELD(nexprs);
 	COPY_POINTER_FIELD(subnode_map, from->nparts * sizeof(int));
 	COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
-	COPY_BITMAPSET_FIELD(extparams);
+	COPY_POINTER_FIELD(hasexecparam, from->nexprs * sizeof(bool));
+	COPY_SCALAR_FIELD(do_initial_prune);
+	COPY_SCALAR_FIELD(do_exec_prune);
 	COPY_BITMAPSET_FIELD(execparams);
 
 	return newnode;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 610f9edaf5..9f6fb7c55f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1742,6 +1742,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
 	WRITE_NODE_FIELD(pruning_steps);
 	WRITE_BITMAPSET_FIELD(present_parts);
 	WRITE_INT_FIELD(nparts);
+	WRITE_INT_FIELD(nexprs);
 
 	appendStringInfoString(str, " :subnode_map");
 	for (i = 0; i < node->nparts; i++)
@@ -1751,7 +1752,12 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
 	for (i = 0; i < node->nparts; i++)
 		appendStringInfo(str, " %d", node->subpart_map[i]);
 
-	WRITE_BITMAPSET_FIELD(extparams);
+	appendStringInfoString(str, " :hasexecparam");
+	for (i = 0; i < node->nexprs; i++)
+		appendStringInfo(str, " %s", booltostr(node->hasexecparam[i]));
+
+	WRITE_BOOL_FIELD(do_initial_prune);
+	WRITE_BOOL_FIELD(do_exec_prune);
 	WRITE_BITMAPSET_FIELD(execparams);
 }
 
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2826cec2f8..a0335ccd76 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1363,9 +1363,12 @@ _readPartitionPruneInfo(void)
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
 	READ_INT_FIELD(nparts);
+	READ_INT_FIELD(nexprs);
 	READ_INT_ARRAY(subnode_map, local_node->nparts);
 	READ_INT_ARRAY(subpart_map, local_node->nparts);
-	READ_BITMAPSET_FIELD(extparams);
+	READ_BOOL_ARRAY(hasexecparam, local_node->nexprs);
+	READ_BOOL_FIELD(do_initial_prune);
+	READ_BOOL_FIELD(do_exec_prune);
 	READ_BITMAPSET_FIELD(execparams);
 
 	READ_DONE();
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..2561c6b835 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -53,10 +53,12 @@
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
+#include "optimizer/var.h"
 #include "partitioning/partprune.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /*
@@ -115,6 +117,14 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
+/*
+ * expression_tree_walker context struct for gathering paramids of params
+ * matching the partition key.
+ */
+typedef struct PullParamContext
+{
+	Bitmapset  *params;
+} PullParamContext;
 
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
@@ -162,7 +172,10 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
 static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
 						  StrategyNumber opstrategy, Datum *values, int nvalues,
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
-static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
+static Bitmapset *pull_exec_paramids(Expr *expr);
+static bool pull_exec_paramids_walker(Node *node, PullParamContext * context);
+static bool analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps,
+					  int partnatts);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
 static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
@@ -197,7 +210,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	int		   *relid_subnode_map;
 	int		   *relid_subpart_map;
 	int			i;
-	bool		gotparam = false;
+	bool		doruntimeprune = false;
 
 	/*
 	 * Allocate two arrays to store the 1-based indexes of the 'subpaths' and
@@ -238,6 +251,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		RangeTblEntry *rte;
 		Bitmapset  *present_parts;
 		int			nparts = subpart->nparts;
+		int			partnatts = subpart->part_scheme->partnatts;
 		int		   *subnode_map;
 		int		   *subpart_map;
 		List	   *partprunequal;
@@ -320,17 +334,13 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		pinfo->pruning_steps = pruning_steps;
 		pinfo->present_parts = present_parts;
 		pinfo->nparts = nparts;
-		pinfo->extparams = NULL;
+		pinfo->nexprs = list_length(pruning_steps) * partnatts;
 		pinfo->execparams = NULL;
 		pinfo->subnode_map = subnode_map;
 		pinfo->subpart_map = subpart_map;
 
-		/*
-		 * Extract Params matching partition key and record if we got any.
-		 * We'll not bother enabling run-time pruning if no params matched the
-		 * partition key at any level of partitioning.
-		 */
-		gotparam |= pull_partkey_params(pinfo, pruning_steps);
+		/* Determine which pruning types should be enabled at this level */
+		doruntimeprune |= analyze_partkey_exprs(pinfo, pruning_steps, partnatts);
 
 		pinfolist = lappend(pinfolist, pinfo);
 	}
@@ -338,14 +348,10 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	pfree(relid_subnode_map);
 	pfree(relid_subpart_map);
 
-	if (gotparam)
+	if (doruntimeprune)
 		return pinfolist;
 
-	/*
-	 * If no Params were found to match the partition key on any of the
-	 * partitioned relations then there's no point doing any run-time
-	 * partition pruning.
-	 */
+	/* No run-time pruning required. */
 	return NIL;
 }
 
@@ -444,9 +450,10 @@ prune_append_rel_partitions(RelOptInfo *rel)
 	context.boundinfo = rel->boundinfo;
 
 	/* Not valid when being called from the planner */
+	context.evalexecparams = false;
 	context.planstate = NULL;
-	context.safeparams = NULL;
 	context.exprstates = NULL;
+	context.exprhasexecparam = NULL;
 
 	/* Actual pruning happens here. */
 	partindexes = get_matching_partitions(&context, pruning_steps);
@@ -1478,6 +1485,10 @@ match_clause_to_partition_key(RelOptInfo *rel,
 		if (contain_volatile_functions((Node *) expr))
 			return PARTCLAUSE_UNSUPPORTED;
 
+		/* We can't prune using an expression with Vars. */
+		if (contain_var_clause((Node *) expr))
+			return PARTCLAUSE_UNSUPPORTED;
+
 		/*
 		 * Determine the input types of the operator we're considering.
 		 *
@@ -1655,7 +1666,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 					return PARTCLAUSE_UNSUPPORTED;
 			}
 			else
-				return PARTCLAUSE_UNSUPPORTED; /* no useful negator */
+				return PARTCLAUSE_UNSUPPORTED;	/* no useful negator */
 		}
 
 		/*
@@ -2683,54 +2694,96 @@ get_matching_range_bounds(PartitionPruneContext *context,
 }
 
 /*
- * pull_partkey_params
- *		Loop through each pruning step and record each external and exec
- *		Params being compared to the partition keys.
+ * pull_exec_paramids
+ *		Returns a Bitmapset containing the paramids of each Param with
+ *		paramkind = PARAM_EXEC in 'expr'.
+ */
+static Bitmapset *
+pull_exec_paramids(Expr *expr)
+{
+	PullParamContext context;
+
+	context.params = NULL;
+
+	pull_exec_paramids_walker((Node *) expr, &context);
+
+	return context.params;
+}
+
+static bool
+pull_exec_paramids_walker(Node *node, PullParamContext * context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXEC)
+			context->params = bms_add_member(context->params, param->paramid);
+		return false;
+	}
+	return expression_tree_walker(node, pull_exec_paramids_walker,
+								  (void *) context);
+}
+
+/*
+ * analyze_partkey_exprs
+ *		Loop through each pruning steps recording which one are comparing exec
+ *		params to the partition key.
+ *
+ * Returns true if run-time partition pruning should be attempted at this
+ * level.
  */
 static bool
-pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
+analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 {
 	ListCell   *lc;
-	bool		gotone = false;
+	bool		doruntimeprune = false;
+
+	pinfo->hasexecparam = palloc0(sizeof(bool) * pinfo->nexprs);
+	pinfo->do_initial_prune = false;
+	pinfo->do_exec_prune = false;
 
 	foreach(lc, steps)
 	{
-		PartitionPruneStepOp *stepop = lfirst(lc);
+		PartitionPruneStepOp *step = lfirst(lc);
 		ListCell   *lc2;
+		int			keyno;
 
-		if (!IsA(stepop, PartitionPruneStepOp))
+		if (!IsA(step, PartitionPruneStepOp))
 			continue;
 
-		foreach(lc2, stepop->exprs)
+		keyno = 0;
+		foreach(lc2, step->exprs)
 		{
 			Expr	   *expr = lfirst(lc2);
 
-			if (IsA(expr, Param))
+			if (!IsA(expr, Const))
 			{
-				Param	   *param = (Param *) expr;
-
-				switch (param->paramkind)
-				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
-						pinfo->execparams = bms_add_member(pinfo->execparams,
-														   param->paramid);
-						break;
+				Bitmapset  *execparams = pull_exec_paramids(expr);
+				bool		hasexecparams;
+				int			stateidx = PruneCxtStateIdx(partnatts,
+														step->step.step_id,
+														keyno);
+
+				hasexecparams = !bms_is_empty(execparams);
+				pinfo->hasexecparam[stateidx] = hasexecparams;
+				pinfo->execparams = bms_add_members(pinfo->execparams,
+													execparams);
+
+				if (!hasexecparams)
+					pinfo->do_initial_prune = true;
+				else
+					pinfo->do_exec_prune = true;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
-				}
-				gotone = true;
+				doruntimeprune = true;
 			}
+			keyno++;
 		}
 	}
 
-	return gotone;
+	return doruntimeprune;
 }
 
 /*
@@ -3031,37 +3084,40 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
+	if (IsA(expr, Const))
 	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
-			return true;
-
-		case T_Param:
-
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
+	{
+		/*
+		 * When called from the executor we'll have a valid planstate so we
+		 * may be able to evaluate the expression which could not be folded to
+		 * a Const during planning.  Since run-time pruning can occur both
+		 * during initialization of the executor or while it's running, we
+		 * must be careful here only to attempt to evaluate expressions
+		 * containing exec params when the executor is running.
+		 */
+		if (context->planstate &&
+			(context->evalexecparams ||
+			 !context->exprhasexecparam[stateidx]))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+			/* Exprs with volatile functions shouldn't make it here */
+			Assert(!contain_volatile_functions((Node *) expr));
 
-				return true;
-			}
-			break;
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
 
-		default:
-			break;
+			return true;
+		}
 	}
 
 	return false;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fc6e9574e3..ed940a8a36 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -133,9 +133,12 @@ typedef struct PartitionTupleRouting
  *								the partition pruning code.
  * pruning_steps				Contains a list of PartitionPruneStep used to
  *								perform the actual pruning.
- * extparams					Contains paramids of external params found
+ * execparams					Contains paramids of exec params found
  *								matching partition keys in 'pruning_steps'.
- * allparams					As 'extparams' but also including exec params.
+ * do_initial_prune				true if pruning should be performed during
+ *								executor startup.
+ * do_exec_prune				true if pruning should be performed during
+ *								executor run.
  *-----------------------
  */
 typedef struct PartitionPruningData
@@ -145,8 +148,9 @@ typedef struct PartitionPruningData
 	Bitmapset  *present_parts;
 	PartitionPruneContext context;
 	List	   *pruning_steps;
-	Bitmapset  *extparams;
-	Bitmapset  *allparams;
+	Bitmapset  *execparams;
+	bool		do_initial_prune;
+	bool		do_exec_prune;
 } PartitionPruningData;
 
 /*-----------------------
@@ -163,24 +167,22 @@ typedef struct PartitionPruningData
  *						partitioned relation. First element contains the
  *						details for the target partitioned table.
  * num_partprunedata	Number of items in 'partprunedata' array.
+ * do_initial_prune		true if pruning should be performed during executor
+ *						startup.
  * prune_context		A memory context which can be used to call the query
  *						planner's partition prune functions.
- * extparams			All PARAM_EXTERN paramids which were found to match a
+ * execparams			All PARAM_EXEC paramids which were found to match a
  *						partition key in each of the contained
  *						PartitionPruningData structs.
- * execparams			As above but for PARAM_EXEC.
- * allparams			Union of 'extparams' and 'execparams', saved to avoid
- *						recalculation.
  *-----------------------
  */
 typedef struct PartitionPruneState
 {
 	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
+	bool		do_initial_prune;
 	MemoryContext prune_context;
-	Bitmapset  *extparams;
 	Bitmapset  *execparams;
-	Bitmapset  *allparams;
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b2a1..e4c366b58f 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1598,9 +1598,16 @@ typedef struct PartitionPruneInfo
 	Bitmapset  *present_parts;	/* Indexes of all partitions which subnodes
 								 * are present for. */
 	int			nparts;			/* The length of the following two arrays */
+	int			nexprs;			/* Size of hasexecparam array */
 	int		   *subnode_map;	/* subnode index by partition id, or -1 */
 	int		   *subpart_map;	/* subpart index by partition id, or -1 */
-	Bitmapset  *extparams;		/* All external paramids seen in prunesteps */
+	bool	   *hasexecparam;	/* true if corresponding pruning_step has an
+								 * exec Param in the Expr being compared to
+								 * the partition key. */
+	bool		do_initial_prune;	/* true if pruning should be performed
+									 * during executor startup. */
+	bool		do_exec_prune;	/* true if pruning should be performed during
+								 * executor run. */
 	Bitmapset  *execparams;		/* All exec paramids seen in prunesteps */
 } PartitionPruneInfo;
 
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 3d114b4c71..028d4cecb6 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -36,6 +36,9 @@ typedef struct PartitionPruneContext
 	/* Number of partitions */
 	int			nparts;
 
+	/* true if it's safe to evaulate exec params */
+	bool		evalexecparams;
+
 	/* Partition boundary info */
 	PartitionBoundInfo boundinfo;
 
@@ -45,18 +48,15 @@ typedef struct PartitionPruneContext
 	 */
 	PlanState  *planstate;
 
-	/*
-	 * Parameters that are safe to be used for partition pruning. execparams
-	 * are not safe to use until the executor is running.
-	 */
-	Bitmapset  *safeparams;
-
 	/*
 	 * Array of ExprStates, indexed as per PruneCtxStateIdx; one for each
 	 * partkey in each pruning step.  Allocated if planstate is non-NULL,
 	 * otherwise NULL.
 	 */
 	ExprState **exprstates;
+
+	/* true if corresponding 'exprstate' expression contains an exec param */
+	bool	   *exprhasexecparam;
 } PartitionPruneContext;
 
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index cf331e79c1..64b4e933d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1835,6 +1835,54 @@ fetch backward all from cur;
 (2 rows)
 
 commit;
+begin;
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=1 loops=1)
+   Subplans Removed: 3
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(1))
+(4 rows)
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=4 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part2 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part3 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part4 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+(9 rows)
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part2 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part3 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part4 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+(13 rows)
+
+rollback;
 drop table list_part;
 -- Parallel append
 -- Suppress the number of loops each parallel node runs for.  This is because
@@ -2079,6 +2127,40 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
 (27 rows)
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+                                      explain_parallel_append                                      
+---------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=2 loops=1)
+         Workers Planned: 1
+         Workers Launched: 1
+         ->  Partial Aggregate (actual rows=1 loops=2)
+               ->  Nested Loop (actual rows=0 loops=2)
+                     ->  Parallel Seq Scan on lprt_a a (actual rows=51 loops=N)
+                           Filter: (a = ANY ('{0,0,1}'::integer[]))
+                     ->  Append (actual rows=0 loops=102)
+                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+(27 rows)
+
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
                                       explain_parallel_append                                      
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1464f4dcd9..b6681fa44c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -396,6 +396,22 @@ fetch backward all from cur;
 
 commit;
 
+begin;
+
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+
+rollback;
+
 drop table list_part;
 
 -- Parallel append
@@ -486,6 +502,10 @@ set enable_mergejoin = 0;
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+
 insert into lprt_a values(3),(3);
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#28)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

I'm really hoping this is what you meant about the special-case code for Params.
Does this look any better?

I'm starting to look this over and it seems like generally the right
thing, though I'm finding minor things I don't like much.

One thing I'm wondering about is why in the world are PartitionPruneInfo
and its subsidiary struct types declared in primnodes.h? They are not
general-purpose expression nodes, or if they are then there are an awful
lot of places that should know about them and do not. AFAICT they might
belong in plannodes.h.

regards, tom lane

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#29)
Re: why partition pruning doesn't work?

I wrote:

One thing I'm wondering about is why in the world are PartitionPruneInfo
and its subsidiary struct types declared in primnodes.h?

Oh, and while I'm bitching: it seems like there is hardly any part of
the partitioning code in which the comments aren't desperately in need
of a copy-editing pass. They are just chock-full of misspellings,
grammar that is faulty enough to make the meaning unclear, and/or
errors of fact. An example of the latter is the repeated claims that
the basic partitioning functions belong to the planner. Maybe that
was true at some stage of development; but AFAICS the logic in question
now lives in src/backend/partitioning/, which I would not think is
part of the planner.

regards, tom lane

#31David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#30)
1 attachment(s)
Re: why partition pruning doesn't work?

On 10 June 2018 at 09:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

One thing I'm wondering about is why in the world are PartitionPruneInfo
and its subsidiary struct types declared in primnodes.h?

That may have been a legacy thing that accidentally wasn't changed
from a previous version of the patch. I've now moved it to
plannodes.h.

Oh, and while I'm bitching: it seems like there is hardly any part of
the partitioning code in which the comments aren't desperately in need
of a copy-editing pass. They are just chock-full of misspellings,
grammar that is faulty enough to make the meaning unclear, and/or
errors of fact. An example of the latter is the repeated claims that
the basic partitioning functions belong to the planner. Maybe that
was true at some stage of development; but AFAICS the logic in question
now lives in src/backend/partitioning/, which I would not think is
part of the planner.

I've made a pass over the execPartition.c and partprune.c code
attempting to resolve these issues. I have hopefully fixed them all,
but I apologise if I've missed any.

I also couldn't resist making a few other improvements to the code.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_pruning_for_exprs_v5.patchapplication/octet-stream; name=run-time_pruning_for_exprs_v5.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c83991c93c..f185e47889 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,9 +48,9 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_subplans_for_params_recurse(PartitionPruneState *prunestate,
+static int find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 								 PartitionPruningData *pprune,
-								 bool allparams,
+								 bool initial_prune,
 								 Bitmapset **validsubplans);
 
 
@@ -1334,54 +1334,53 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * Run-Time Partition Pruning Support.
  *
  * The following series of functions exist to support the removal of unneeded
- * subnodes for queries against partitioned tables.  The supporting functions
+ * subplans for queries against partitioned tables.  The supporting functions
  * here are designed to work with any node type which supports an arbitrary
- * number of subnodes, e.g. Append, MergeAppend.
+ * number of subplans, e.g. Append, MergeAppend.
  *
- * Normally this pruning work is performed by the query planner's partition
- * pruning code, however, the planner is limited to only being able to prune
- * away unneeded partitions using quals which compare the partition key to a
- * value which is known to be Const during planning.  To allow the same
- * pruning to be performed for values which are only determined during
- * execution, we must make an additional pruning attempt during execution.
+ * Run-time pruning works in addition to plan-time pruning, however, during
+ * query planning, pruning is limited to only being able to use expressions
+ * which are known Consts.  To allow pruning to be performed using values which
+ * aren't known during planning, we must make an additional pruning attempt
+ * during execution.
  *
- * Here we support pruning using both external and exec Params.  The main
- * difference between these that we need to concern ourselves with is the
- * time when the values of the Params are known.  External Param values are
- * known at any time of execution, including executor startup, but exec Param
- * values are only known when the executor is running.
+ * Here we support pruning using any non-volatile expression, thus allowing
+ * pruning to be performed using both external and exec params, in fact, any
+ * stable expression approved by gen_partprune_steps can be evaluated and
+ * used here.  However, if an expression contains an exec param it cannot be
+ * evaluated until during execution. For everything else, we can perform
+ * pruning during executor startup.
  *
- * For external Params we may be able to prune away unneeded partitions
- * during executor startup.  This has the added benefit of not having to
- * initialize the unneeded subnodes at all.  This is useful as it can save
- * quite a bit of effort during executor startup.
+ * When performing pruning using expressions containing exec params, we must
+ * perform partition pruning again each time one of these params changes.  It
+ * is the calling code's responsibility to ensure that this happens.
  *
- * For exec Params, we must delay pruning until the executor is running.
+ * Having the ability to prune away unneeded subplans during executor startup
+ * has the added benefit of not having to initialize the unneeded subplans.
  *
  * Functions:
  *
  * ExecSetupPartitionPruneState:
- *		This must be called by nodes before any partition pruning is
- *		attempted.  Normally executor startup is a good time. This function
- *		creates the PartitionPruneState details which are required by each
- *		of the two pruning functions, details include information about
- *		how to map the partition index details which are returned by the
- *		planner's partition prune function into subnode indexes.
+ *		Creates the PartitionPruneState as required by each of the two pruning
+ *		functions.  Details stored include how to map the partition index
+ *		returned by the partition pruning code into subplans indexes.
  *
  * ExecFindInitialMatchingSubPlans:
- *		Returns indexes of matching subnodes utilizing only external Params
- *		to eliminate subnodes.  The function must only be called during
- *		executor startup for the given node before the subnodes themselves
- *		are initialized.  Subnodes which are found not to match by this
- *		function must not be included in the node's list of subnodes as this
- *		function performs a remap of the partition index to subplan index map
- *		and the newly created map provides indexes only for subnodes which
- *		remain after calling this function.
+ *		Returns indexes of matching subplans.  Here partition pruning is
+ *		performed using all expressions found in the partition pruning steps
+ *		apart from expressions containing exec params.  This function must
+ *		only be called once and must only ne called during initialization of
+ *		the node which it applies to.  Subplans which are found not to match
+ *		by this function must not be included in the node's list of subplans
+ *		as this function performs a remap of the partition index to subplan
+ *		index map and the newly created map provides indexes only for subplans
+ *		which remain after calling this function.
  *
  * ExecFindMatchingSubPlans:
- *		Returns indexes of matching subnodes utilizing all Params to eliminate
- *		subnodes which can't possibly contain matching tuples.  This function
- *		can only be called while the executor is running.
+ *		Returns indexes of matching subplans evaluating all expressions in the
+ *		pruning steps, including exec params.  This function can only be
+ *		called during execution and must be called again each time the value
+ *		of a param listed in PartitionPruneState's 'execparams' changes.
  *-------------------------------------------------------------------------
  */
 
@@ -1392,32 +1391,29 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'partitionpruneinfo' is a List of PartitionPruneInfos as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneContext for each
- * item in the List.  These contexts can be re-used each time we re-evaulate
+ * item in that List.  These contexts can be re-used each time we reevaluate
  * which partitions match the pruning steps provided in each
  * PartitionPruneInfo.
  */
 PartitionPruneState *
 ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 {
-	PartitionPruningData *prunedata;
 	PartitionPruneState *prunestate;
 	ListCell   *lc;
 	int			i;
+	int			num_partprunedata;
+	Size		size;
 
 	Assert(partitionpruneinfo != NIL);
 
-	prunestate = (PartitionPruneState *) palloc(sizeof(PartitionPruneState));
-	prunedata = (PartitionPruningData *)
-		palloc(sizeof(PartitionPruningData) * list_length(partitionpruneinfo));
+	num_partprunedata = list_length(partitionpruneinfo);
+	size = offsetof(PartitionPruneState, partprunedata);
+	size += sizeof(PartitionPruningData) * num_partprunedata;
 
-	/*
-	 * The first item in the array contains the details for the query's target
-	 * partition, so record that as the root of the partition hierarchy.
-	 */
-	prunestate->partprunedata = prunedata;
-	prunestate->num_partprunedata = list_length(partitionpruneinfo);
-	prunestate->extparams = NULL;
+	prunestate = (PartitionPruneState *) palloc(size);
+	prunestate->num_partprunedata = num_partprunedata;
 	prunestate->execparams = NULL;
+	prunestate->do_initial_prune = false;	/* may be set below */
 
 	/*
 	 * Create a sub memory context which we'll use when making calls to the
@@ -1435,7 +1431,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 	foreach(lc, partitionpruneinfo)
 	{
 		PartitionPruneInfo *pinfo = (PartitionPruneInfo *) lfirst(lc);
-		PartitionPruningData *pprune = &prunedata[i];
+		PartitionPruningData *pprune = &prunestate->partprunedata[i];
 		PartitionPruneContext *context = &pprune->context;
 		PartitionDesc partdesc;
 		Relation	rel;
@@ -1444,24 +1440,24 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		int			partnatts;
 		int			n_steps;
 
-		pprune->present_parts = bms_copy(pinfo->present_parts);
-		pprune->subnode_map = palloc(sizeof(int) * pinfo->nparts);
+		pprune->present_parts = pinfo->present_parts;
+		pprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
 
 		/*
 		 * We must make a copy of this rather than pointing directly to the
 		 * plan's version as we may end up making modifications to it later.
 		 */
-		memcpy(pprune->subnode_map, pinfo->subnode_map,
+		memcpy(pprune->subplan_map, pinfo->subplan_map,
 			   sizeof(int) * pinfo->nparts);
 
-		/* We can use the subpart_map verbatim, since we never modify it */
+		/* We can use the subpart_map verbatim since we never modify it */
 		pprune->subpart_map = pinfo->subpart_map;
 
 		/*
 		 * Grab some info from the table's relcache; lock was already obtained
 		 * by ExecLockNonLeafAppendTables.
 		 */
-		rel = relation_open(pinfo->reloid, NoLock);
+		rel = heap_open(pinfo->reloid, NoLock);
 
 		partkey = RelationGetPartitionKey(rel);
 		partdesc = RelationGetPartitionDesc(rel);
@@ -1476,9 +1472,14 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		context->nparts = pinfo->nparts;
 		context->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
 		context->planstate = planstate;
-		context->safeparams = NULL; /* empty for now */
 		context->exprstates = palloc0(sizeof(ExprState *) * n_steps * partnatts);
 
+		/*
+		 * Use the hasexecparam. This is not modified anywhere, so we just
+		 * borrow the plan's copy.
+		 */
+		context->exprhasexecparam = pinfo->hasexecparam;
+
 		/* Initialize expression states for each expression */
 		foreach(lc2, pinfo->pruning_steps)
 		{
@@ -1511,65 +1512,51 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		}
 
 		pprune->pruning_steps = pinfo->pruning_steps;
-		pprune->extparams = bms_copy(pinfo->extparams);
-		pprune->allparams = bms_union(pinfo->extparams, pinfo->execparams);
+		pprune->execparams = pinfo->execparams;
+		pprune->do_initial_prune = pinfo->do_initial_prune;
+		pprune->do_exec_prune = pinfo->do_exec_prune;
 
 		/*
-		 * Accumulate the paramids which match the partitioned keys of all
-		 * partitioned tables.
+		 * Accumulate the exec paramids which match the partitioned keys of
+		 * all partitioned tables.
 		 */
-		prunestate->extparams = bms_add_members(prunestate->extparams,
-												pinfo->extparams);
-
 		prunestate->execparams = bms_add_members(prunestate->execparams,
 												 pinfo->execparams);
 
-		relation_close(rel, NoLock);
+		/*
+		 * Record if an initial prune would be useful at any level of the
+		 * partition hierarchy.
+		 */
+		prunestate->do_initial_prune |= pinfo->do_initial_prune;
+
+		heap_close(rel, NoLock);
 
 		i++;
 	}
 
-	/*
-	 * Cache the union of the paramids of both types.  This saves having to
-	 * recalculate it everytime we need to know what they are.
-	 */
-	prunestate->allparams = bms_union(prunestate->extparams,
-									  prunestate->execparams);
-
 	return prunestate;
 }
 
 /*
  * ExecFindInitialMatchingSubPlans
- *		Determine which subset of subplan nodes we need to initialize based
- *		on the details stored in 'prunestate'.  Here we only determine the
- *		matching partitions using values known during plan startup, which is
- *		only external Params.  Exec Params will be unknown at this time.  We
- *		must delay pruning using exec Params until the actual executor run.
- *
- * It is expected that callers of this function do so only once during their
- * init plan.  The caller must only initialize the subnodes which are returned
- * by this function. The remaining subnodes should be discarded.  Once this
- * function has been called, future calls to ExecFindMatchingSubPlans will
- * return its matching subnode indexes assuming that the caller discarded
- * the original non-matching subnodes.
+ *		Determine the minimum set of subplans matching these pruning steps
+ *		without evaluation of exec Params.  We also re-map the translation
+ *		matrix which allows conversion of partition indexes into subplan index
+ *		to account for the unneeded subplans having been removed.
  *
- * This function must only be called if 'prunestate' has any extparams.
+ * Must only be called once per 'prunestate'.
  *
- * 'nsubnodes' must be passed as the total number of unpruned subnodes.
+ * 'nsubplans' must be passed as the total number of unpruned subplans.
  */
 Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 {
 	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
+	int			matches;
 
-	/*
-	 * Ensure there's actually external params, or we've not been called
-	 * already.
-	 */
-	Assert(!bms_is_empty(prunestate->extparams));
+	Assert(prunestate->do_initial_prune);
 
 	pprune = prunestate->partprunedata;
 
@@ -1579,8 +1566,9 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	/* Determine which subnodes match the external params */
-	find_subplans_for_params_recurse(prunestate, pprune, false, &result);
+	/* Perform pruning without using exec params */
+	matches = find_subplans_for_params_recurse(prunestate, pprune, true,
+											   &result);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1590,86 +1578,113 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 	MemoryContextReset(prunestate->prune_context);
 
 	/*
-	 * Record that partition pruning has been performed for external params.
-	 * These are not required again afterwards, and nullifying them helps
-	 * ensure nothing accidentally calls this function twice on the same
-	 * PartitionPruneState.
-	 *
-	 * (Note we keep prunestate->allparams, because we do use that one
-	 * repeatedly in ExecFindMatchingSubPlans).
-	 */
-	bms_free(prunestate->extparams);
-	prunestate->extparams = NULL;
-
-	/*
-	 * If any subnodes were pruned, we must re-sequence the subnode indexes so
+	 * If any subplans were pruned, we must re-sequence the subplan indexes so
 	 * that ExecFindMatchingSubPlans properly returns the indexes from the
-	 * subnodes which will remain after execution of this function.
+	 * subplans which will remain after execution of this function.
 	 */
-	if (bms_num_members(result) < nsubnodes)
+	if (matches < nsubplans)
 	{
-		int		   *new_subnode_indexes;
+		int		   *new_subplan_indexes;
 		int			i;
 		int			newidx;
+		int			num_partprunedata;
 
 		/*
-		 * First we must build an array which we can use to adjust the
-		 * existing subnode_map so that it contains the new subnode indexes.
+		 * First we must build an array which to stores the new 1-based index
+		 * of the subplan node.  Elements which are set to 0 after this are
+		 * newly pruned partitions.
 		 */
-		new_subnode_indexes = (int *) palloc(sizeof(int) * nsubnodes);
-		newidx = 0;
-		for (i = 0; i < nsubnodes; i++)
-		{
-			if (bms_is_member(i, result))
-				new_subnode_indexes[i] = newidx++;
-			else
-				new_subnode_indexes[i] = -1;	/* Newly pruned */
-		}
+		new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
+		newidx = 1;
+		i = -1;
+		while ((i = bms_next_member(result, i)) >= 0)
+			new_subplan_indexes[i] = newidx++;
+
+		num_partprunedata = prunestate->num_partprunedata;
 
 		/*
-		 * Now we can re-sequence each PartitionPruneInfo's subnode_map so
-		 * that they point to the new index of the subnode.
+		 * Now we can re-sequence each PartitionPruneInfo's subplan_map so
+		 * that they point to the new index of the subplan.
 		 */
-		for (i = 0; i < prunestate->num_partprunedata; i++)
+		for (i = 0; i < num_partprunedata; i++)
 		{
-			int			nparts;
+			Bitmapset  *new_parts = NULL;
 			int			j;
 
 			pprune = &prunestate->partprunedata[i];
-			nparts = pprune->context.nparts;
 
-			/*
-			 * We also need to reset the present_parts field so that it only
-			 * contains partition indexes that we actually still have subnodes
-			 * for.  It seems easier to build a fresh one, rather than trying
-			 * to update the existing one.
-			 */
-			bms_free(pprune->present_parts);
-			pprune->present_parts = NULL;
-
-			for (j = 0; j < nparts; j++)
+			/* Redetermine which partitions are now present. */
+			j = -1;
+			while ((j = bms_next_member(pprune->present_parts, j)) >= 0)
 			{
-				int			oldidx = pprune->subnode_map[j];
+				int			oldidx = pprune->subplan_map[j];
 
 				/*
-				 * If this partition existed as a subnode then change the old
-				 * subnode index to the new subnode index.  The new index may
+				 * If this partition existed as a subplan then change the old
+				 * subplan index to the new subplan index.  The new index may
 				 * become -1 if the partition was pruned above, or it may just
-				 * come earlier in the subnode list due to some subnodes being
+				 * come earlier in the subplan list due to some subplans being
 				 * removed earlier in the list.
 				 */
 				if (oldidx >= 0)
 				{
-					pprune->subnode_map[j] = new_subnode_indexes[oldidx];
+					int			newidx = new_subplan_indexes[oldidx] - 1;
+
+					pprune->subplan_map[j] = newidx;
 
-					if (new_subnode_indexes[oldidx] >= 0)
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
+					if (newidx >= 0)
+						new_parts = bms_add_member(new_parts, j);
 				}
 			}
+
+			/*
+			 * Replace without pfreeing the original, since that memory
+			 * belongs to the plan.
+			 */
+			pprune->present_parts = new_parts;
 		}
 
-		pfree(new_subnode_indexes);
+		/*
+		 * Now we must determine which sub-partitioned tables still have
+		 * unpruned partitions.  The easiest way to do this is to simply loop
+		 * over each PartitionPruningData again checking if there are any
+		 * 'present_parts' for the sub-partitioned table.  We needn't bother
+		 * doing this if there are no sub-partitioned tables.
+		 */
+		if (num_partprunedata > 1)
+		{
+			for (i = 0; i < num_partprunedata; i++)
+			{
+				int			nparts;
+				int			j;
+
+				pprune = &prunestate->partprunedata[i];
+				nparts = pprune->context.nparts;
+
+				/*
+				 * XXX if we still had the old present_parts available then we
+				 * could do a bms_next_member() loop here.  Is it worth
+				 * stashing it somewhere to allow that?
+				 */
+				for (j = 0; j < nparts; j++)
+				{
+					int			subidx = pprune->subpart_map[j];
+
+					if (subidx >= 0)
+					{
+						PartitionPruningData *subprune;
+
+						subprune = &prunestate->partprunedata[subidx];
+
+						if (!bms_is_empty(subprune->present_parts))
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, j);
+					}
+				}
+			}
+		}
+
+		pfree(new_subplan_indexes);
 	}
 
 	return result;
@@ -1677,10 +1692,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubnodes)
 
 /*
  * ExecFindMatchingSubPlans
- *		Determine which subplans match the pruning steps detailed in
- *		'pprune' for the current Param values.
- *
- * Here we utilize both external and exec Params for pruning.
+ *		Determine which subplan are required based on the sets of pruning
+ *		steps stored in 'prunestate'.
  */
 Bitmapset *
 ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
@@ -1697,7 +1710,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	find_subplans_for_params_recurse(prunestate, pprune, true, &result);
+	find_subplans_for_params_recurse(prunestate, pprune, false, &result);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1712,73 +1725,68 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 /*
  * find_subplans_for_params_recurse
  *		Recursive worker function for ExecFindMatchingSubPlans and
- *		ExecFindInitialMatchingSubPlans
+ *		ExecFindInitialMatchingSubPlans.
+ *
+ * Returns the number of matching subplans found.
  */
-static void
+static int
 find_subplans_for_params_recurse(PartitionPruneState *prunestate,
 								 PartitionPruningData *pprune,
-								 bool allparams,
+								 bool initial_prune,
 								 Bitmapset **validsubplans)
 {
 	PartitionPruneContext *context = &pprune->context;
 	Bitmapset  *partset;
-	Bitmapset  *pruneparams;
+	int			matches = 0;
+	int			partidx;
 	int			i;
 
 	/* Guard against stack overflow due to overly deep partition hierarchy. */
 	check_stack_depth();
 
 	/*
-	 * Use only external params unless we've been asked to also use exec
-	 * params too.
-	 */
-	if (allparams)
-		pruneparams = pprune->allparams;
-	else
-		pruneparams = pprune->extparams;
-
-	/*
-	 * We only need to determine the matching partitions if there are any
-	 * params matching the partition key at this level.  If there are no
-	 * matching params, then we can simply return all subnodes which belong to
-	 * this parent partition.  The planner should have already determined
-	 * these to be the minimum possible set.  We must still recursively visit
-	 * any subpartitioned tables as we may find their partition keys match
-	 * some Params at their level.
+	 * Only prune if pruning would be useful at this level.  Pruning is free
+	 * to evaluate exec params when initial_prune is false.
 	 */
-	if (!bms_is_empty(pruneparams))
+	if (!initial_prune && pprune->do_exec_prune)
 	{
-		context->safeparams = pruneparams;
+		context->evalexecparams = true;
+		partset = get_matching_partitions(context,
+										  pprune->pruning_steps);
+	}
+	else if (initial_prune && pprune->do_initial_prune)
+	{
+		context->evalexecparams = false;
 		partset = get_matching_partitions(context,
 										  pprune->pruning_steps);
 	}
 	else
+	{
+		/* No pruning required?  Just include all partitions. */
 		partset = pprune->present_parts;
+	}
 
-	/* Translate partset into subnode indexes */
+	/* Translate partset into subplan indexes */
 	i = -1;
 	while ((i = bms_next_member(partset, i)) >= 0)
 	{
-		if (pprune->subnode_map[i] >= 0)
+		if (pprune->subplan_map[i] >= 0)
+		{
 			*validsubplans = bms_add_member(*validsubplans,
-											pprune->subnode_map[i]);
+											pprune->subplan_map[i]);
+			matches++;
+		}
+
+		/* Recurse if the partition is a partitioned table */
+		else if ((partidx = pprune->subpart_map[i]) >= 0)
+			matches += find_subplans_for_params_recurse(prunestate,
+														&prunestate->partprunedata[partidx],
+														initial_prune, validsubplans);
 		else
 		{
-			int			partidx = pprune->subpart_map[i];
-
-			if (partidx != -1)
-				find_subplans_for_params_recurse(prunestate,
-												 &prunestate->partprunedata[partidx],
-												 allparams, validsubplans);
-			else
-			{
-				/*
-				 * This could only happen if clauses used in planning where
-				 * more restrictive than those used here, or if the maps are
-				 * somehow corrupt.
-				 */
-				elog(ERROR, "partition missing from subplans");
-			}
+			/* Shouldn't happen */
+			elog(ERROR, "partition missing from subplans");
 		}
 	}
+	return matches;
 }
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6bc3e470bf..707a3e0e4b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,11 +138,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 		prunestate = ExecSetupPartitionPruneState(&appendstate->ps,
 												  node->part_prune_infos);
 
-		/*
-		 * When there are external params matching the partition key we may be
-		 * able to prune away Append subplans now.
-		 */
-		if (!bms_is_empty(prunestate->extparams))
+		/* Perform an initial partition prune, if required. */
+		if (prunestate->do_initial_prune)
 		{
 			/* Determine which subplans match the external params */
 			validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7afe..4c6bddaa35 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1190,6 +1190,29 @@ _copyPlanInvalItem(const PlanInvalItem *from)
 	return newnode;
 }
 
+/*
+ * _copyPartitionPruneInfo
+ */
+static PartitionPruneInfo *
+_copyPartitionPruneInfo(const PartitionPruneInfo *from)
+{
+	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
+
+	COPY_SCALAR_FIELD(reloid);
+	COPY_NODE_FIELD(pruning_steps);
+	COPY_BITMAPSET_FIELD(present_parts);
+	COPY_SCALAR_FIELD(nparts);
+	COPY_SCALAR_FIELD(nexprs);
+	COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
+	COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
+	COPY_POINTER_FIELD(hasexecparam, from->nexprs * sizeof(bool));
+	COPY_SCALAR_FIELD(do_initial_prune);
+	COPY_SCALAR_FIELD(do_exec_prune);
+	COPY_BITMAPSET_FIELD(execparams);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					   primnodes.h copy functions
  * ****************************************************************
@@ -2166,23 +2189,6 @@ _copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
 	return newnode;
 }
 
-static PartitionPruneInfo *
-_copyPartitionPruneInfo(const PartitionPruneInfo *from)
-{
-	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
-
-	COPY_SCALAR_FIELD(reloid);
-	COPY_NODE_FIELD(pruning_steps);
-	COPY_BITMAPSET_FIELD(present_parts);
-	COPY_SCALAR_FIELD(nparts);
-	COPY_POINTER_FIELD(subnode_map, from->nparts * sizeof(int));
-	COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
-	COPY_BITMAPSET_FIELD(extparams);
-	COPY_BITMAPSET_FIELD(execparams);
-
-	return newnode;
-}
-
 /* ****************************************************************
  *						relation.h copy functions
  *
@@ -4904,6 +4910,9 @@ copyObjectImpl(const void *from)
 		case T_PlanInvalItem:
 			retval = _copyPlanInvalItem(from);
 			break;
+		case T_PartitionPruneInfo:
+			retval = _copyPartitionPruneInfo(from);
+			break;
 
 			/*
 			 * PRIMITIVE NODES
@@ -5089,9 +5098,6 @@ copyObjectImpl(const void *from)
 		case T_PlaceHolderInfo:
 			retval = _copyPlaceHolderInfo(from);
 			break;
-		case T_PartitionPruneInfo:
-			retval = _copyPartitionPruneInfo(from);
-			break;
 
 			/*
 			 * VALUE NODES
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 610f9edaf5..b6b53a2de3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1019,6 +1019,36 @@ _outPlanInvalItem(StringInfo str, const PlanInvalItem *node)
 	WRITE_UINT_FIELD(hashValue);
 }
 
+static void
+_outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
+{
+	int			i;
+
+	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+
+	WRITE_OID_FIELD(reloid);
+	WRITE_NODE_FIELD(pruning_steps);
+	WRITE_BITMAPSET_FIELD(present_parts);
+	WRITE_INT_FIELD(nparts);
+	WRITE_INT_FIELD(nexprs);
+
+	appendStringInfoString(str, " :subplan_map");
+	for (i = 0; i < node->nparts; i++)
+		appendStringInfo(str, " %d", node->subplan_map[i]);
+
+	appendStringInfoString(str, " :subpart_map");
+	for (i = 0; i < node->nparts; i++)
+		appendStringInfo(str, " %d", node->subpart_map[i]);
+
+	appendStringInfoString(str, " :hasexecparam");
+	for (i = 0; i < node->nexprs; i++)
+		appendStringInfo(str, " %s", booltostr(node->hasexecparam[i]));
+
+	WRITE_BOOL_FIELD(do_initial_prune);
+	WRITE_BOOL_FIELD(do_exec_prune);
+	WRITE_BITMAPSET_FIELD(execparams);
+}
+
 /*****************************************************************************
  *
  *	Stuff from primnodes.h.
@@ -1731,30 +1761,6 @@ _outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
 	WRITE_NODE_FIELD(exclRelTlist);
 }
 
-static void
-_outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
-{
-	int			i;
-
-	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
-
-	WRITE_OID_FIELD(reloid);
-	WRITE_NODE_FIELD(pruning_steps);
-	WRITE_BITMAPSET_FIELD(present_parts);
-	WRITE_INT_FIELD(nparts);
-
-	appendStringInfoString(str, " :subnode_map");
-	for (i = 0; i < node->nparts; i++)
-		appendStringInfo(str, " %d", node->subnode_map[i]);
-
-	appendStringInfoString(str, " :subpart_map");
-	for (i = 0; i < node->nparts; i++)
-		appendStringInfo(str, " %d", node->subpart_map[i]);
-
-	WRITE_BITMAPSET_FIELD(extparams);
-	WRITE_BITMAPSET_FIELD(execparams);
-}
-
 /*****************************************************************************
  *
  *	Stuff from relation.h.
@@ -3824,6 +3830,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PlanInvalItem:
 				_outPlanInvalItem(str, obj);
 				break;
+			case T_PartitionPruneInfo:
+				_outPartitionPruneInfo(str, obj);
+				break;
 			case T_Alias:
 				_outAlias(str, obj);
 				break;
@@ -3983,9 +3992,6 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionPruneStepCombine:
 				_outPartitionPruneStepCombine(str, obj);
 				break;
-			case T_PartitionPruneInfo:
-				_outPartitionPruneInfo(str, obj);
-				break;
 			case T_Path:
 				_outPath(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2826cec2f8..18bbf37e14 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1354,23 +1354,6 @@ _readPartitionPruneStepCombine(void)
 	READ_DONE();
 }
 
-static PartitionPruneInfo *
-_readPartitionPruneInfo(void)
-{
-	READ_LOCALS(PartitionPruneInfo);
-
-	READ_OID_FIELD(reloid);
-	READ_NODE_FIELD(pruning_steps);
-	READ_BITMAPSET_FIELD(present_parts);
-	READ_INT_FIELD(nparts);
-	READ_INT_ARRAY(subnode_map, local_node->nparts);
-	READ_INT_ARRAY(subpart_map, local_node->nparts);
-	READ_BITMAPSET_FIELD(extparams);
-	READ_BITMAPSET_FIELD(execparams);
-
-	READ_DONE();
-}
-
 /*
  *	Stuff from parsenodes.h.
  */
@@ -2376,6 +2359,29 @@ _readPlanInvalItem(void)
 	READ_DONE();
 }
 
+/*
+ * _readPartitionPruneInfo
+ */
+static PartitionPruneInfo *
+_readPartitionPruneInfo(void)
+{
+	READ_LOCALS(PartitionPruneInfo);
+
+	READ_OID_FIELD(reloid);
+	READ_NODE_FIELD(pruning_steps);
+	READ_BITMAPSET_FIELD(present_parts);
+	READ_INT_FIELD(nparts);
+	READ_INT_FIELD(nexprs);
+	READ_INT_ARRAY(subplan_map, local_node->nparts);
+	READ_INT_ARRAY(subpart_map, local_node->nparts);
+	READ_BOOL_ARRAY(hasexecparam, local_node->nexprs);
+	READ_BOOL_FIELD(do_initial_prune);
+	READ_BOOL_FIELD(do_exec_prune);
+	READ_BITMAPSET_FIELD(execparams);
+
+	READ_DONE();
+}
+
 /*
  * _readSubPlan
  */
@@ -2620,8 +2626,6 @@ parseNodeString(void)
 		return_value = _readPartitionPruneStepOp();
 	else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
 		return_value = _readPartitionPruneStepCombine();
-	else if (MATCH("PARTITIONPRUNEINFO", 18))
-		return_value = _readPartitionPruneInfo();
 	else if (MATCH("RTE", 3))
 		return_value = _readRangeTblEntry();
 	else if (MATCH("RANGETBLFUNCTION", 16))
@@ -2724,6 +2728,8 @@ parseNodeString(void)
 		return_value = _readPlanRowMark();
 	else if (MATCH("PLANINVALITEM", 13))
 		return_value = _readPlanInvalItem();
+	else if (MATCH("PARTITIONPRUNEINFO", 18))
+		return_value = _readPartitionPruneInfo();
 	else if (MATCH("SUBPLAN", 7))
 		return_value = _readSubPlan();
 	else if (MATCH("ALTERNATIVESUBPLAN", 18))
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 58ec2a684d..640105ddf2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -25,7 +25,7 @@
  * the outputs of some other steps using the appropriate combination method.
  * All steps that are constructed are executed in succession such that for any
  * "combine" step, all of the steps whose output it depends on are executed
- * first and their ouput preserved.
+ * first and their output preserved.
  *
  * See gen_partprune_steps_internal() for more details on step generation.
  *
@@ -53,6 +53,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
+#include "optimizer/var.h"
 #include "partitioning/partprune.h"
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
@@ -115,6 +116,14 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
+/*
+ * expression_tree_walker context struct for gathering the paramids in an
+ * expression.
+ */
+typedef struct PullParamContext
+{
+	Bitmapset  *params;
+} PullParamContext;
 
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
@@ -162,7 +171,10 @@ static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
 static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
 						  StrategyNumber opstrategy, Datum *values, int nvalues,
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
-static bool pull_partkey_params(PartitionPruneInfo *pinfo, List *steps);
+static Bitmapset *pull_exec_paramids(Expr *expr);
+static bool pull_exec_paramids_walker(Node *node, PullParamContext *context);
+static bool analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps,
+					  int partnatts);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
 static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
@@ -180,7 +192,7 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
  *		pruning to take place.
  *
  * Here we generate partition pruning steps for 'prunequal' and also build a
- * data stucture which allows mapping of partition indexes into 'subpaths'
+ * data structure which allows mapping of partition indexes into 'subpaths'
  * indexes.
  *
  * If no Params were found to match the partition key in any of the
@@ -194,16 +206,16 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 	RelOptInfo *targetpart = NULL;
 	ListCell   *lc;
 	List	   *pinfolist = NIL;
-	int		   *relid_subnode_map;
+	int		   *relid_subplan_map;
 	int		   *relid_subpart_map;
 	int			i;
-	bool		gotparam = false;
+	bool		doruntimeprune = false;
 
 	/*
 	 * Allocate two arrays to store the 1-based indexes of the 'subpaths' and
 	 * 'partitioned_rels' by relid.
 	 */
-	relid_subnode_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+	relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
 	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
 
 	i = 1;
@@ -215,7 +227,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		Assert(IS_SIMPLE_REL(pathrel));
 		Assert(pathrel->relid < root->simple_rel_array_size);
 
-		relid_subnode_map[pathrel->relid] = i++;
+		relid_subplan_map[pathrel->relid] = i++;
 	}
 
 	/* Likewise for the partition_rels */
@@ -238,7 +250,8 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		RangeTblEntry *rte;
 		Bitmapset  *present_parts;
 		int			nparts = subpart->nparts;
-		int		   *subnode_map;
+		int			partnatts = subpart->part_scheme->partnatts;
+		int		   *subplan_map;
 		int		   *subpart_map;
 		List	   *partprunequal;
 		List	   *pruning_steps;
@@ -284,7 +297,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 			return NIL;
 		}
 
-		subnode_map = (int *) palloc(nparts * sizeof(int));
+		subplan_map = (int *) palloc(nparts * sizeof(int));
 		subpart_map = (int *) palloc(nparts * sizeof(int));
 		present_parts = NULL;
 
@@ -297,10 +310,10 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		for (i = 0; i < nparts; i++)
 		{
 			RelOptInfo *partrel = subpart->part_rels[i];
-			int			subnodeidx = relid_subnode_map[partrel->relid] - 1;
+			int			subnodeidx = relid_subplan_map[partrel->relid] - 1;
 			int			subpartidx = relid_subpart_map[partrel->relid] - 1;
 
-			subnode_map[i] = subnodeidx;
+			subplan_map[i] = subnodeidx;
 			subpart_map[i] = subpartidx;
 
 			/*
@@ -320,32 +333,27 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
 		pinfo->pruning_steps = pruning_steps;
 		pinfo->present_parts = present_parts;
 		pinfo->nparts = nparts;
-		pinfo->extparams = NULL;
-		pinfo->execparams = NULL;
-		pinfo->subnode_map = subnode_map;
+		pinfo->nexprs = list_length(pruning_steps) * partnatts;
+		pinfo->subplan_map = subplan_map;
 		pinfo->subpart_map = subpart_map;
 
 		/*
-		 * Extract Params matching partition key and record if we got any.
-		 * We'll not bother enabling run-time pruning if no params matched the
-		 * partition key at any level of partitioning.
+		 * Determine when run-time pruning needs to be performed for this
+		 * partitioned table.
 		 */
-		gotparam |= pull_partkey_params(pinfo, pruning_steps);
+		doruntimeprune |= analyze_partkey_exprs(pinfo, pruning_steps,
+												partnatts);
 
 		pinfolist = lappend(pinfolist, pinfo);
 	}
 
-	pfree(relid_subnode_map);
+	pfree(relid_subplan_map);
 	pfree(relid_subpart_map);
 
-	if (gotparam)
+	if (doruntimeprune)
 		return pinfolist;
 
-	/*
-	 * If no Params were found to match the partition key on any of the
-	 * partitioned relations then there's no point doing any run-time
-	 * partition pruning.
-	 */
+	/* No run-time pruning required. */
 	return NIL;
 }
 
@@ -444,9 +452,10 @@ prune_append_rel_partitions(RelOptInfo *rel)
 	context.boundinfo = rel->boundinfo;
 
 	/* Not valid when being called from the planner */
+	context.evalexecparams = false;
 	context.planstate = NULL;
-	context.safeparams = NULL;
 	context.exprstates = NULL;
+	context.exprhasexecparam = NULL;
 
 	/* Actual pruning happens here. */
 	partindexes = get_matching_partitions(&context, pruning_steps);
@@ -1478,6 +1487,10 @@ match_clause_to_partition_key(RelOptInfo *rel,
 		if (contain_volatile_functions((Node *) expr))
 			return PARTCLAUSE_UNSUPPORTED;
 
+		/* We can't prune using an expression with Vars. */
+		if (contain_var_clause((Node *) expr))
+			return PARTCLAUSE_UNSUPPORTED;
+
 		/*
 		 * Determine the input types of the operator we're considering.
 		 *
@@ -1655,7 +1668,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 					return PARTCLAUSE_UNSUPPORTED;
 			}
 			else
-				return PARTCLAUSE_UNSUPPORTED; /* no useful negator */
+				return PARTCLAUSE_UNSUPPORTED;	/* no useful negator */
 		}
 
 		/*
@@ -2683,54 +2696,97 @@ get_matching_range_bounds(PartitionPruneContext *context,
 }
 
 /*
- * pull_partkey_params
- *		Loop through each pruning step and record each external and exec
- *		Params being compared to the partition keys.
+ * pull_exec_paramids
+ *		Returns a Bitmapset containing the paramids of each Param with
+ *		paramkind = PARAM_EXEC in 'expr'.
+ */
+static Bitmapset *
+pull_exec_paramids(Expr *expr)
+{
+	PullParamContext context;
+
+	context.params = NULL;
+
+	pull_exec_paramids_walker((Node *) expr, &context);
+
+	return context.params;
+}
+
+static bool
+pull_exec_paramids_walker(Node *node, PullParamContext *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXEC)
+			context->params = bms_add_member(context->params, param->paramid);
+		return false;
+	}
+	return expression_tree_walker(node, pull_exec_paramids_walker,
+								  (void *) context);
+}
+
+/*
+ * analyze_partkey_exprs
+ *		Loop through each pruning steps recording which one are comparing exec
+ *		params to the partition key.
+ *
+ * Returns true if run-time partition pruning should be attempted for this
+ * partitioned table.
  */
 static bool
-pull_partkey_params(PartitionPruneInfo *pinfo, List *steps)
+analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 {
 	ListCell   *lc;
-	bool		gotone = false;
+	bool		doruntimeprune = false;
+
+	pinfo->hasexecparam = palloc0(sizeof(bool) * pinfo->nexprs);
+	pinfo->execparams = NULL;
+	pinfo->do_initial_prune = false;
+	pinfo->do_exec_prune = false;
 
 	foreach(lc, steps)
 	{
-		PartitionPruneStepOp *stepop = lfirst(lc);
+		PartitionPruneStepOp *step = lfirst(lc);
 		ListCell   *lc2;
+		int			keyno;
 
-		if (!IsA(stepop, PartitionPruneStepOp))
+		if (!IsA(step, PartitionPruneStepOp))
 			continue;
 
-		foreach(lc2, stepop->exprs)
+		keyno = 0;
+		foreach(lc2, step->exprs)
 		{
 			Expr	   *expr = lfirst(lc2);
 
-			if (IsA(expr, Param))
+			if (!IsA(expr, Const))
 			{
-				Param	   *param = (Param *) expr;
-
-				switch (param->paramkind)
-				{
-					case PARAM_EXTERN:
-						pinfo->extparams = bms_add_member(pinfo->extparams,
-														  param->paramid);
-						break;
-					case PARAM_EXEC:
-						pinfo->execparams = bms_add_member(pinfo->execparams,
-														   param->paramid);
-						break;
+				Bitmapset  *execparams = pull_exec_paramids(expr);
+				bool		hasexecparams;
+				int			stateidx = PruneCxtStateIdx(partnatts,
+														step->step.step_id,
+														keyno);
+
+				hasexecparams = !bms_is_empty(execparams);
+				pinfo->hasexecparam[stateidx] = hasexecparams;
+				pinfo->execparams = bms_add_members(pinfo->execparams,
+													execparams);
+
+				if (!hasexecparams)
+					pinfo->do_initial_prune = true;
+				else
+					pinfo->do_exec_prune = true;
 
-					default:
-						elog(ERROR, "unrecognized paramkind: %d",
-							 (int) param->paramkind);
-						break;
-				}
-				gotone = true;
+				doruntimeprune = true;
 			}
+			keyno++;
 		}
 	}
 
-	return gotone;
+	return doruntimeprune;
 }
 
 /*
@@ -3031,37 +3087,43 @@ static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
 						Expr *expr, int stateidx, Datum *value)
 {
-	switch (nodeTag(expr))
+	if (IsA(expr, Const))
 	{
-		case T_Const:
-			*value = ((Const *) expr)->constvalue;
-			return true;
-
-		case T_Param:
+		*value = ((Const *) expr)->constvalue;
+		return true;
+	}
+	else
+	{
+		/*
+		 * When called from the executor we'll have a valid planstate so we
+		 * may be able to evaluate expressions which could not be folded to
+		 * constants during planning.  Since run-time pruning can occur both
+		 * during initialization of the executor and while it's running, we
+		 * must be careful not to attempt to evaluate expressions containing
+		 * exec params during initialization of the executor.
+		 */
+		if (context->planstate &&
+			(context->evalexecparams ||
+			 !context->exprhasexecparam[stateidx]))
+		{
+			ExprState  *exprstate;
+			ExprContext *ectx;
+			bool		isNull;
 
-			/*
-			 * When being called from the executor we may be able to evaluate
-			 * the Param's value.
-			 */
-			if (context->planstate &&
-				bms_is_member(((Param *) expr)->paramid, context->safeparams))
-			{
-				ExprState  *exprstate;
-				ExprContext *ectx;
-				bool		isNull;
+			/* Exprs with volatile functions shouldn't make it here */
+			Assert(!contain_volatile_functions((Node *) expr));
 
-				exprstate = context->exprstates[stateidx];
-				ectx = context->planstate->ps_ExprContext;
-				*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-				if (isNull)
-					return false;
+			/* Exprs with Vars shouldn't make it here either */
+			Assert(!contain_var_clause((Node *) expr));
 
-				return true;
-			}
-			break;
+			exprstate = context->exprstates[stateidx];
+			ectx = context->planstate->ps_ExprContext;
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
+			if (isNull)
+				return false;
 
-		default:
-			break;
+			return true;
+		}
 	}
 
 	return false;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fc6e9574e3..ae850f4d6e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,9 +121,9 @@ typedef struct PartitionTupleRouting
  * bypass certain subnodes when we have proofs that indicate that no tuple
  * matching the 'pruning_steps' will be found within.
  *
- * subnode_map					An array containing the subnode index which
+ * subplan_map					An array containing the subplan index which
  *								matches this partition index, or -1 if the
- *								subnode has been pruned already.
+ *								subplan has been pruned already.
  * subpart_map					An array containing the offset into the
  *								'partprunedata' array in PartitionPruning, or
  *								-1 if there is no such element in that array.
@@ -133,20 +133,24 @@ typedef struct PartitionTupleRouting
  *								the partition pruning code.
  * pruning_steps				Contains a list of PartitionPruneStep used to
  *								perform the actual pruning.
- * extparams					Contains paramids of external params found
+ * execparams					Contains paramids of exec params found
  *								matching partition keys in 'pruning_steps'.
- * allparams					As 'extparams' but also including exec params.
+ * do_initial_prune				true if pruning should be performed during
+ *								executor startup.
+ * do_exec_prune				true if pruning should be performed during
+ *								executor run.
  *-----------------------
  */
 typedef struct PartitionPruningData
 {
-	int		   *subnode_map;
+	int		   *subplan_map;
 	int		   *subpart_map;
 	Bitmapset  *present_parts;
 	PartitionPruneContext context;
 	List	   *pruning_steps;
-	Bitmapset  *extparams;
-	Bitmapset  *allparams;
+	Bitmapset  *execparams;
+	bool		do_initial_prune;
+	bool		do_exec_prune;
 } PartitionPruningData;
 
 /*-----------------------
@@ -159,28 +163,28 @@ typedef struct PartitionPruningData
  * the clauses being unable to match to any tuple that the subnode could
  * possibly produce.
  *
- * partprunedata		Array of PartitionPruningData for the node's target
- *						partitioned relation. First element contains the
- *						details for the target partitioned table.
  * num_partprunedata	Number of items in 'partprunedata' array.
+ * do_initial_prune		true if pruning should be performed during executor
+ *						startup.
  * prune_context		A memory context which can be used to call the query
  *						planner's partition prune functions.
- * extparams			All PARAM_EXTERN paramids which were found to match a
+ * execparams			All PARAM_EXEC paramids which were found to match a
  *						partition key in each of the contained
- *						PartitionPruningData structs.
- * execparams			As above but for PARAM_EXEC.
- * allparams			Union of 'extparams' and 'execparams', saved to avoid
- *						recalculation.
+ *						PartitionPruningData structs.  Pruning must be done
+ *						again each time the value of one of these parameters
+ *						changes.
+ * partprunedata		Array of PartitionPruningData for the node's target
+ *						partitioned relation. First element contains the
+ *						details for the target partitioned table.
  *-----------------------
  */
 typedef struct PartitionPruneState
 {
-	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
+	bool		do_initial_prune;
 	MemoryContext prune_context;
-	Bitmapset  *extparams;
 	Bitmapset  *execparams;
-	Bitmapset  *allparams;
+	PartitionPruningData partprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb159a6da..120b1c7a7b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -88,6 +88,7 @@ typedef enum NodeTag
 	T_NestLoopParam,
 	T_PlanRowMark,
 	T_PlanInvalItem,
+	T_PartitionPruneInfo,
 
 	/*
 	 * TAGS FOR PLAN STATE NODES (execnodes.h)
@@ -195,7 +196,6 @@ typedef enum NodeTag
 	T_PartitionPruneStep,
 	T_PartitionPruneStepOp,
 	T_PartitionPruneStepCombine,
-	T_PartitionPruneInfo,
 
 	/*
 	 * TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2dda82e66..0ff14c96f3 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1062,4 +1062,35 @@ typedef struct PlanInvalItem
 	uint32		hashValue;		/* hash value of object's cache lookup key */
 } PlanInvalItem;
 
+/*
+ * PartitionPruneInfo - Details required to allow the executor to prune
+ * partitions.
+ *
+ * Here we store mapping details to allow translation of a partitioned table's
+ * index as returned by the partition pruning code into subplan indexes for
+ * node types which support arbitrary numbers of subplans, such as Append.
+ * We also store various details to give indication to the executor when it
+ * should be performing partition pruning.
+ */
+typedef struct PartitionPruneInfo
+{
+	NodeTag		type;
+	Oid			reloid;			/* Oid of partition rel */
+	List	   *pruning_steps;	/* List of PartitionPruneStep */
+	Bitmapset  *present_parts;	/* Indexes of all partitions which subnodes
+								 * are present for. */
+	int			nparts;			/* The length of the following two arrays */
+	int			nexprs;			/* Size of hasexecparam array */
+	int		   *subplan_map;	/* subplan index by partition id, or -1 */
+	int		   *subpart_map;	/* subpart index by partition id, or -1 */
+	bool	   *hasexecparam;	/* true if corresponding pruning_step has an
+								 * exec Param in the Expr being compared to
+								 * the partition key. */
+	bool		do_initial_prune;	/* true if pruning should be performed
+									 * during executor startup. */
+	bool		do_exec_prune;	/* true if pruning should be performed during
+								 * executor run. */
+	Bitmapset  *execparams;		/* All exec paramids seen in prunesteps */
+} PartitionPruneInfo;
+
 #endif							/* PLANNODES_H */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b2a1..ff5c4ff8e4 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1581,27 +1581,4 @@ typedef struct PartitionPruneStepCombine
 	List	   *source_stepids;
 } PartitionPruneStepCombine;
 
-/*----------
- * PartitionPruneInfo - Details required to allow the executor to prune
- * partitions.
- *
- * Here we store mapping details to allow translation of a partitioned table's
- * index into subnode indexes for node types which support arbitrary numbers
- * of sub nodes, such as Append.
- *----------
- */
-typedef struct PartitionPruneInfo
-{
-	NodeTag		type;
-	Oid			reloid;			/* Oid of partition rel */
-	List	   *pruning_steps;	/* List of PartitionPruneStep */
-	Bitmapset  *present_parts;	/* Indexes of all partitions which subnodes
-								 * are present for. */
-	int			nparts;			/* The length of the following two arrays */
-	int		   *subnode_map;	/* subnode index by partition id, or -1 */
-	int		   *subpart_map;	/* subpart index by partition id, or -1 */
-	Bitmapset  *extparams;		/* All external paramids seen in prunesteps */
-	Bitmapset  *execparams;		/* All exec paramids seen in prunesteps */
-} PartitionPruneInfo;
-
 #endif							/* PRIMNODES_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 3d114b4c71..4ac8cf5552 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -36,6 +36,9 @@ typedef struct PartitionPruneContext
 	/* Number of partitions */
 	int			nparts;
 
+	/* true if it's safe to evaluate exec params */
+	bool		evalexecparams;
+
 	/* Partition boundary info */
 	PartitionBoundInfo boundinfo;
 
@@ -45,18 +48,15 @@ typedef struct PartitionPruneContext
 	 */
 	PlanState  *planstate;
 
-	/*
-	 * Parameters that are safe to be used for partition pruning. execparams
-	 * are not safe to use until the executor is running.
-	 */
-	Bitmapset  *safeparams;
-
 	/*
 	 * Array of ExprStates, indexed as per PruneCtxStateIdx; one for each
 	 * partkey in each pruning step.  Allocated if planstate is non-NULL,
 	 * otherwise NULL.
 	 */
 	ExprState **exprstates;
+
+	/* true if corresponding 'exprstate' expression contains an exec param */
+	bool	   *exprhasexecparam;
 } PartitionPruneContext;
 
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index cf331e79c1..64b4e933d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1835,6 +1835,54 @@ fetch backward all from cur;
 (2 rows)
 
 commit;
+begin;
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=1 loops=1)
+   Subplans Removed: 3
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(1))
+(4 rows)
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=4 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part2 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part3 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+   ->  Seq Scan on list_part4 (actual rows=1 loops=1)
+         Filter: (a = list_part_fn(a))
+(9 rows)
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on list_part1 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part2 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part3 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+   ->  Seq Scan on list_part4 (actual rows=0 loops=1)
+         Filter: (a = (list_part_fn(1) + a))
+         Rows Removed by Filter: 1
+(13 rows)
+
+rollback;
 drop table list_part;
 -- Parallel append
 -- Suppress the number of loops each parallel node runs for.  This is because
@@ -2079,6 +2127,40 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                                  Index Cond: (a = a.a)
 (27 rows)
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+                                      explain_parallel_append                                      
+---------------------------------------------------------------------------------------------------
+ Finalize Aggregate (actual rows=1 loops=1)
+   ->  Gather (actual rows=2 loops=1)
+         Workers Planned: 1
+         Workers Launched: 1
+         ->  Partial Aggregate (actual rows=1 loops=2)
+               ->  Nested Loop (actual rows=0 loops=2)
+                     ->  Parallel Seq Scan on lprt_a a (actual rows=51 loops=N)
+                           Filter: (a = ANY ('{0,0,1}'::integer[]))
+                     ->  Append (actual rows=0 loops=102)
+                           ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=0 loops=2)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+                           ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 (never executed)
+                                 Index Cond: (a = (a.a + 0))
+(27 rows)
+
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
                                       explain_parallel_append                                      
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 1464f4dcd9..b6681fa44c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -396,6 +396,22 @@ fetch backward all from cur;
 
 commit;
 
+begin;
+
+-- Test run-time pruning using stable functions
+create function list_part_fn(int) returns int as $$ begin return $1; end;$$ language plpgsql stable;
+
+-- Ensure pruning works using a stable function containing no Vars
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1);
+
+-- Ensure pruning does not take place when the function contains a Var parameter
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(a);
+
+-- Ensure pruning does not take place when the expression contains a Var.
+explain (analyze, costs off, summary off, timing off) select * from list_part where a = list_part_fn(1) + a;
+
+rollback;
+
 drop table list_part;
 
 -- Parallel append
@@ -486,6 +502,10 @@ set enable_mergejoin = 0;
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(0, 0, 1)');
 
+-- Ensure the same partitions are pruned when we make the nested loop
+-- parameter an Expr rather than a plain Param.
+select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a + 0 where a.a in(0, 0, 1)');
+
 insert into lprt_a values(3),(3);
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#31)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

I've made a pass over the execPartition.c and partprune.c code
attempting to resolve these issues. I have hopefully fixed them all,
but I apologise if I've missed any.
I also couldn't resist making a few other improvements to the code.

By the time this arrived, I'd already whacked around your v4 patch
quite a bit, so rather than start over I just kept going with what
I had, and then tried to merge the useful bits of this one after
the fact. I intentionally left out a couple of changes I couldn't
get excited about (such as having find_subplans_for_params_recurse
return a count), but I think I got everything in v5 otherwise.

I'm still fairly unhappy about the state of the comments, though.
It's very unclear for example what the subplan_map and subpart_map
arrays really are, eg what are they indexed by? I get the impression
that only one of them can have a non-minus-1 value for a given index,
but that's nowhere explained. Also, we have

* partprunedata Array of PartitionPruningData for the plan's target
* partitioned relation. First element contains the
* details for the target partitioned table.

And? What are the other elements, what's the index rule, is there a
specific ordering for the other elements? For that matter, "target
partitioned table" is content-free. Do you mean topmost partitioned
table? I suspect we expect the hierarchy to be flattened such that
ancestors appear before children, but that's not stated --- and if it
were, this bit about the first element would be a consequence of that.

Code-wise, there are some loose ends to be looked at.

* Related to the above, doesn't the loop at execPartition.c:1658 need
to work back-to-front? Seems like it's trying to propagate info upwards
in the hierarchy; looking at a subpartition's present_parts value when
you still might change it later doesn't look right at all.

* partkey_datum_from_expr and its caller seem pretty brain-dead with
respect to nulls. It's not even considering the possibility that a
Const has constisnull = true. Now perhaps such a case can't reach
here because of plan-time constant-folding, but I don't think this code
has any business assuming that. It's also being very stupid about
null values from expressions, just throwing up its hands and supposing
that nothing can be proven. In reality, doesn't a null guarantee we
can eliminate all partitions, since the comparison functions are
presumed strict?

* I'm fairly suspicious of the fmgr_info and fmgr_info_copy calls in
perform_pruning_base_step. Those seem likely to leak memory, and
for sure they destroy any opportunity for the called comparison
function to cache info in fn_extra --- something that's critical
for performance for some comparison functions such as array_eq.
Why is it necessary to suppose that those functions could change
from one execution to the next?

* The business with ExecFindInitialMatchingSubPlans remapping the
subplan indexes seems very dubious to me. Surely, that is adding
way more complexity and fragility than it's worth.

regards, tom lane

#33David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#32)
1 attachment(s)
Re: why partition pruning doesn't work?

Thanks for working on and pushing those fixes.

On 11 June 2018 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's very unclear for example what the subplan_map and subpart_map
arrays really are, eg what are they indexed by? I get the impression
that only one of them can have a non-minus-1 value for a given index,
but that's nowhere explained. Also, we have

They're indexed by the partition indexes as returned by the partition
pruning code. I've included a patch to fix this.

* partprunedata Array of PartitionPruningData for the plan's target
* partitioned relation. First element contains the
* details for the target partitioned table.

And? What are the other elements, what's the index rule, is there a
specific ordering for the other elements? For that matter, "target
partitioned table" is content-free. Do you mean topmost partitioned
table? I suspect we expect the hierarchy to be flattened such that
ancestors appear before children, but that's not stated --- and if it
were, this bit about the first element would be a consequence of that.

I've hopefully improved this comment in the attached patch. Although
we may need to delay this until [1]/messages/by-id/CAKJS1f9KG5nnOFhigWm4ND5Ut-yU075iJyA+ACVyQnZ-ZW1Qtw@mail.gmail.com has been addressed

Only the index of the first element is important. Looking at
add_paths_to_append_rel() the parents will always be listed before
their children. That's not very well documented either, but having
the top-level parent as the first element is relied upon for more than
runtime partition pruning.

Code-wise, there are some loose ends to be looked at.

* Related to the above, doesn't the loop at execPartition.c:1658 need
to work back-to-front? Seems like it's trying to propagate info upwards
in the hierarchy; looking at a subpartition's present_parts value when
you still might change it later doesn't look right at all.

It works just fine front-to-back. That's why I added the 2nd loop. It
propagates the work done in the first loop, so the code, as it is now,
works if the array is in any order.

It may be possible to run the first loop back-to-front and get rid of
the 2nd one. I've done this in the attached patch. I felt less
confident doing this earlier as the order of that array was not
defined well.

* partkey_datum_from_expr and its caller seem pretty brain-dead with
respect to nulls. It's not even considering the possibility that a
Const has constisnull = true. Now perhaps such a case can't reach
here because of plan-time constant-folding, but I don't think this code
has any business assuming that. It's also being very stupid about
null values from expressions, just throwing up its hands and supposing
that nothing can be proven. In reality, doesn't a null guarantee we
can eliminate all partitions, since the comparison functions are
presumed strict?

You are right. I admit to not having thought more about that and just
taking the safe option. It makes sense to fix it. I imagine it could
be a bit annoying that pruning just gives up in such a case. I've
added code for this in the attached patch.

* I'm fairly suspicious of the fmgr_info and fmgr_info_copy calls in
perform_pruning_base_step. Those seem likely to leak memory, and
for sure they destroy any opportunity for the called comparison
function to cache info in fn_extra --- something that's critical
for performance for some comparison functions such as array_eq.
Why is it necessary to suppose that those functions could change
from one execution to the next?

IIRC the problem there was that we didn't think of a good place to
cache the FmgrInfo. But rethinking of that now we could probably use
the same method as I used in run-time pruning to cache the exprstate.
Amit, what do you think?

* The business with ExecFindInitialMatchingSubPlans remapping the
subplan indexes seems very dubious to me. Surely, that is adding
way more complexity and fragility than it's worth.

It's all contained in a single .c file. It does not seem that hard to
get it right. Nothing outside of exexPartition.c has any business
changing anything in these arrays. Nothing else even has to look at
them.

I think having the ability to prune during executor initialisation is
enormously important. I think without it, this patch is less than half
as useful. However, if you didn't mean removing the executor
initialise pruning, and just not remapping the arrays, then I'm not
sure how we'd do that. I'd previously tried having NULL subnodes in
the Append and I thought it required too much hacking work in
explain.c, so I decided to just collapse the array instead and do what
was required to make it work after having removed the unneeded
subplans. Did you have another idea on how to do this?

[1]: /messages/by-id/CAKJS1f9KG5nnOFhigWm4ND5Ut-yU075iJyA+ACVyQnZ-ZW1Qtw@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

various_partition_prune_fixes.patchapplication/octet-stream; name=various_partition_prune_fixes.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0a003d9935..cbce6fda8e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1605,9 +1605,13 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 
 		/*
 		 * Now we can re-sequence each PartitionPruneInfo's subplan_map so
-		 * that they point to the new index of the subplan.
+		 * that they point to the new index of the subplan.  We perform this
+		 * loop in a back-to-front order so that we determine present_parts
+		 * for the lowest-level partitioned tables first.  This way we can
+		 * determine if a sub-partitioned table's partitions were entirely
+		 * pruned so we can exclude that from 'present_parts'.
 		 */
-		for (i = 0; i < prunestate->num_partprunedata; i++)
+		for (i = prunestate->num_partprunedata - 1; i >= 0 ; i--)
 		{
 			int			nparts;
 			int			j;
@@ -1627,7 +1631,7 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 			for (j = 0; j < nparts; j++)
 			{
 				int			oldidx = pprune->subplan_map[j];
-
+				int			subidx;
 				/*
 				 * If this partition existed as a subplan then change the old
 				 * subplan index to the new subplan index.  The new index may
@@ -1643,41 +1647,17 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 						pprune->present_parts =
 							bms_add_member(pprune->present_parts, j);
 				}
-			}
-		}
-
-		/*
-		 * Now we must determine which sub-partitioned tables still have
-		 * unpruned partitions.  The easiest way to do this is to simply loop
-		 * over each PartitionPruningData again checking if there are any
-		 * 'present_parts' in the sub-partitioned table.  We needn't bother
-		 * doing this if there are no sub-partitioned tables.
-		 */
-		if (prunestate->num_partprunedata > 1)
-		{
-			for (i = 0; i < prunestate->num_partprunedata; i++)
-			{
-				int			nparts;
-				int			j;
-
-				pprune = &prunestate->partprunedata[i];
-				nparts = pprune->context.nparts;
-
-				for (j = 0; j < nparts; j++)
+				else if ((subidx = pprune->subpart_map[j]) >= 0)
 				{
-					int			subidx = pprune->subpart_map[j];
-
-					if (subidx >= 0)
-					{
-						PartitionPruningData *subprune;
+					PartitionPruningData *subprune;
 
-						subprune = &prunestate->partprunedata[subidx];
+					subprune = &prunestate->partprunedata[subidx];
 
-						if (!bms_is_empty(subprune->present_parts))
-							pprune->present_parts =
-								bms_add_member(pprune->present_parts, j);
-					}
+					if (!bms_is_empty(subprune->present_parts))
+						pprune->present_parts =
+							bms_add_member(pprune->present_parts, j);
 				}
+
 			}
 		}
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 477b9f7fb8..146298828d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1451,7 +1451,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
-		 * rels for this child.
+		 * rels for this child.  We must ensure that parents are always listed
+		 * before their child partitioned tables.
 		 */
 		if (build_partitioned_rels)
 		{
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index fc0388e621..a09464dede 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -170,7 +170,7 @@ static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *cont
 static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
 							   Expr *partkey, Expr **outconst);
 static bool partkey_datum_from_expr(PartitionPruneContext *context,
-						Expr *expr, int stateidx, Datum *value);
+						Expr *expr, int stateidx, Datum *value, bool *isnull);
 
 
 /*
@@ -183,9 +183,15 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
  * data structure which allows mapping of partition indexes into 'subpaths'
  * indexes.
  *
+ * 'partition_rels' must list all RT index of all parent partitioned tables
+ * for all partitions seen in 'subpaths'.  Heigher level parent partitioned
+ * tables must come before their child partitioned table, meaning the top-most
+ * parent must be first in the list.
+ *
  * If no non-Const expressions are being compared to the partition key in any
- * of the 'partitioned_rels', then we return NIL.  In such a case run-time
- * partition pruning would be useless, since the planner did it already.
+ * of the 'partitioned_rels', then we return NIL to indicate no run-time
+ * pruning should be performed.  Run-time pruning would be useless, since the
+ * pruning done during planning will have pruned everything that can be.
  */
 List *
 make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
@@ -2835,21 +2841,40 @@ perform_pruning_base_step(PartitionPruneContext *context,
 			Expr	   *expr;
 			int			stateidx;
 			Datum		datum;
+			bool		isnull;
 
 			expr = lfirst(lc1);
 			stateidx = PruneCxtStateIdx(context->partnatts,
 										opstep->step.step_id, keyno);
-			if (partkey_datum_from_expr(context, expr, stateidx, &datum))
+			if (partkey_datum_from_expr(context, expr, stateidx, &datum,
+										&isnull))
 			{
 				Oid			cmpfn;
 
+				/*
+				 * Since we only allow strict operators in pruning steps, any
+				 * NULL valued datum cannot possibily match any partitions.
+				 */
+				if (isnull)
+				{
+					PruneStepResult *result;
+
+					result = (PruneStepResult *) palloc(sizeof(PruneStepResult));
+					result->bound_offsets = NULL;
+					result->scan_default = false;
+					result->scan_null = false;
+
+					return result;
+				}
+
+				cmpfn = lfirst_oid(lc2);
+				Assert(OidIsValid(cmpfn));
+
 				/*
 				 * If we're going to need a different comparison function than
 				 * the one cached in the PartitionKey, we'll need to look up
 				 * the FmgrInfo.
 				 */
-				cmpfn = lfirst_oid(lc2);
-				Assert(OidIsValid(cmpfn));
 				if (cmpfn != context->partsupfunc[keyno].fn_oid)
 					fmgr_info(cmpfn, &partsupfunc[keyno]);
 				else
@@ -3073,7 +3098,8 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
  *
  * Evaluate 'expr', whose ExprState is stateidx of the context exprstate
  * array; set *value to the resulting Datum.  Return true if evaluation was
- * possible, otherwise false.
+ * possible, otherwise false. 'isnull' is set to indicate if the 'value' datum
+ * is SQL NULL or not.
  *
  * Note that the evaluated result may be in the per-tuple memory context of
  * context->planstate->ps_ExprContext, and we may have leaked other memory
@@ -3082,11 +3108,14 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
  */
 static bool
 partkey_datum_from_expr(PartitionPruneContext *context,
-						Expr *expr, int stateidx, Datum *value)
+						Expr *expr, int stateidx, Datum *value, bool *isnull)
 {
 	if (IsA(expr, Const))
 	{
-		*value = ((Const *) expr)->constvalue;
+		Const *con = (Const *) expr;
+
+		*value = con->constvalue;
+		*isnull = con->constisnull;
 		return true;
 	}
 	else
@@ -3105,14 +3134,10 @@ partkey_datum_from_expr(PartitionPruneContext *context,
 		{
 			ExprState  *exprstate;
 			ExprContext *ectx;
-			bool		isNull;
 
 			exprstate = context->exprstates[stateidx];
 			ectx = context->planstate->ps_ExprContext;
-			*value = ExecEvalExprSwitchContext(exprstate, ectx, &isNull);
-			if (isNull)
-				return false;
-
+			*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
 			return true;
 		}
 	}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 9f0b817c54..59176139e7 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,12 +121,21 @@ typedef struct PartitionTupleRouting
  * subplan when we can prove that no tuple matching the 'pruning_steps' will
  * be found within.
  *
- * subplan_map					An array containing the subplan index which
- *								matches this partition index, or -1 if the
- *								subplan has been pruned already.
- * subpart_map					An array containing the index into the
- *								partprunedata array in PartitionPruneState, or
- *								-1 if there is no such element in that array.
+ * subplan_map					An array indexed by the partitioned table's
+ *								partition index as returned by the pruning
+ *								code.  Contains the index of the subplan for
+ *								that partition, or -1 if that subplan does not
+ *								exists due to having been pruned away already
+ *								or the index belongs to a subpartitioned
+ *								table.
+ * subpart_map					An array indexed by the partitioned table's
+ *								partition index as returned by the pruning
+ *								code.  Contains the index of the
+ *								PartitionPruneState 'partprunedata' element
+ *								which stores the details for this
+ *								subpartitioned table, or -1 if that table was
+ *								pruned away already, or if the index belongs
+ *								to a leaf partition.
  * present_parts				A Bitmapset of the partition indexes that we
  *								have subplans mapped for.
  * context						Contains the context details required to call
@@ -161,7 +170,9 @@ typedef struct PartitionPruningData
  *
  * partprunedata		Array of PartitionPruningData for the plan's target
  *						partitioned relation. First element contains the
- *						details for the target partitioned table.
+ *						details for the target partitioned table the remaining
+ *						elements are in flattened hierarchical order with the
+ *						parents coming before their children.
  * num_partprunedata	Number of items in 'partprunedata' array.
  * do_initial_prune		true if pruning should be performed during executor
  *						startup (at any hierarchy level).
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dacc50edc2..42343de210 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -220,7 +220,10 @@ typedef struct ModifyTable
 	CmdType		operation;		/* INSERT, UPDATE, or DELETE */
 	bool		canSetTag;		/* do we set the command tag/es_processed? */
 	Index		nominalRelation;	/* Parent RT index for use of EXPLAIN */
-	/* RT indexes of non-leaf tables in a partition tree */
+	/*
+	 * A flattened hierarchical list of RT indexes of partitioned tables in
+	 * parent to child order having the top-most parent first.
+	 */
 	List	   *partitioned_rels;
 	bool		partColsUpdated;	/* some part key in hierarchy updated */
 	List	   *resultRelations;	/* integer list of RT indexes */
@@ -257,7 +260,10 @@ typedef struct Append
 	 */
 	int			first_partial_plan;
 
-	/* RT indexes of non-leaf tables in a partition tree */
+	/*
+	 * A flattened hierarchical list of RT indexes of partitioned tables in
+	 * parent to child order having the top-most parent first.
+	 */
 	List	   *partitioned_rels;
 
 	/* Info for run-time subplan pruning, one entry per partitioned_rels */
@@ -272,7 +278,10 @@ typedef struct Append
 typedef struct MergeAppend
 {
 	Plan		plan;
-	/* RT indexes of non-leaf tables in a partition tree */
+	/*
+	 * A flattened hierarchical list of RT indexes of partitioned tables in
+	 * parent to child order having the top-most parent first.
+	 */
 	List	   *partitioned_rels;
 	List	   *mergeplans;
 	/* remaining fields are just like the sort-key info in struct Sort */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3b28d1994f..78177aa687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -689,7 +689,9 @@ typedef struct RelOptInfo
 									 * stored in the same order of bounds */
 	List	  **partexprs;		/* Non-nullable partition key expressions. */
 	List	  **nullable_partexprs; /* Nullable partition key expressions. */
-	List	   *partitioned_child_rels; /* List of RT indexes. */
+	List	   *partitioned_child_rels; /* Flattened hierarchical list of
+										 * partitioned table of RT indexes
+										 * in parent to child order. */
 } RelOptInfo;
 
 /*
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index ab32c7d67e..47c6aa4817 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2731,6 +2731,52 @@ explain (analyze, costs off, summary off, timing off)  execute q1 (1,2,2,1);
          Filter: ((b = ANY (ARRAY[$1, $2])) AND ($3 <> b) AND ($4 <> b))
 (4 rows)
 
+drop table listp_2;
+-- Ensure Params that evaulate to NULL properly prune away all partitions
+explain (analyze, costs off, summary off, timing off) select * from listp where a = (select null::int);
+                  QUERY PLAN                  
+----------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Seq Scan on listp_1_1 (never executed)
+         Filter: (a = $0)
+(5 rows)
+
+prepare q2 (int) as select * from listp where a = $1;
+execute q2 (1);
+ a | b 
+---+---
+(0 rows)
+
+execute q2 (1);
+ a | b 
+---+---
+(0 rows)
+
+execute q2 (1);
+ a | b 
+---+---
+(0 rows)
+
+execute q2 (1);
+ a | b 
+---+---
+(0 rows)
+
+execute q2 (1);
+ a | b 
+---+---
+(0 rows)
+
+explain (analyze, costs off, summary off, timing off)  execute q2 (null);
+                  QUERY PLAN                  
+----------------------------------------------
+ Append (actual rows=0 loops=1)
+   ->  Seq Scan on listp_1_1 (never executed)
+         Filter: (a = $1)
+(3 rows)
+
 drop table listp;
 -- Ensure runtime pruning works with initplans params with boolean types
 create table boolvalues (value bool not null);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 609fe09aeb..8d62b19a3d 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -685,6 +685,21 @@ explain (analyze, costs off, summary off, timing off)  execute q1 (1,2,2,0);
 -- One subplan will remain in this case, but it should not be executed.
 explain (analyze, costs off, summary off, timing off)  execute q1 (1,2,2,1);
 
+drop table listp_2;
+
+-- Ensure Params that evaulate to NULL properly prune away all partitions
+explain (analyze, costs off, summary off, timing off) select * from listp where a = (select null::int);
+
+prepare q2 (int) as select * from listp where a = $1;
+
+execute q2 (1);
+execute q2 (1);
+execute q2 (1);
+execute q2 (1);
+execute q2 (1);
+
+explain (analyze, costs off, summary off, timing off)  execute q2 (null);
+
 drop table listp;
 
 -- Ensure runtime pruning works with initplans params with boolean types
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#33)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 11 June 2018 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* The business with ExecFindInitialMatchingSubPlans remapping the
subplan indexes seems very dubious to me. Surely, that is adding
way more complexity and fragility than it's worth.

I think having the ability to prune during executor initialisation is
enormously important. I think without it, this patch is less than half
as useful.

Sure.

However, if you didn't mean removing the executor
initialise pruning, and just not remapping the arrays, then I'm not
sure how we'd do that. I'd previously tried having NULL subnodes in
the Append and I thought it required too much hacking work in
explain.c, so I decided to just collapse the array instead and do what
was required to make it work after having removed the unneeded
subplans. Did you have another idea on how to do this?

No, that was pretty much exactly what I was envisioning. I have
not looked at the consequences for explain.c but it seemed like
it couldn't be all that bad; and to my mind the remapping business
in the partition code is pretty bad. "It's all in one file" is not
a suitable justification for unintelligible, overcomplex code.

regards, tom lane

#35David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#34)
Re: why partition pruning doesn't work?

On 12 June 2018 at 02:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

However, if you didn't mean removing the executor
initialise pruning, and just not remapping the arrays, then I'm not
sure how we'd do that. I'd previously tried having NULL subnodes in
the Append and I thought it required too much hacking work in
explain.c, so I decided to just collapse the array instead and do what
was required to make it work after having removed the unneeded
subplans. Did you have another idea on how to do this?

No, that was pretty much exactly what I was envisioning. I have
not looked at the consequences for explain.c but it seemed like
it couldn't be all that bad; and to my mind the remapping business
in the partition code is pretty bad. "It's all in one file" is not
a suitable justification for unintelligible, overcomplex code.

By all means, please have a look then.

I've been down that route. I didn't like it. I particularly think the
memory fragmentation is very good grounds for a good cache hit ratio
either, not to mention the slowdown of bms_next_member when there are
large gaps in the set. Keep in mind that we may scan the Append many
times over when it's on the inside of a nested loop join.

What you're proposing exchanges logic that fits neatly into one
function for special logic that will be scattered all over explain.c.
I really don't think that's the better of the two evils.

I'd rather just see my last patch applied which simplifies the re-map
code by removing the 2nd loop. Actually, even updating the
present_parts to remove the unneeded subpartitioned table indexes is
optional. We' just need to give up on the elog(ERROR, "partition
missing from subplans"); error and assume that case is fine.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#35)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 12 June 2018 at 02:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... I'd previously tried having NULL subnodes in
the Append and I thought it required too much hacking work in
explain.c,

No, that was pretty much exactly what I was envisioning.

What you're proposing exchanges logic that fits neatly into one
function for special logic that will be scattered all over explain.c.
I really don't think that's the better of the two evils.

As far as I can see, it'd involve about three or four lines' worth of
rewrite in one place-you-already-made-quite-ugly in explain.c, and an
added if-test in planstate_walk_members, and that'd be it. That seems
like a pretty cheap price for being able to vastly simplify the invariants
for the pruning functions. In fact, I doubt you'd even need two of them
anymore; just one with a bool flag for can-use-exec-params.

I'd rather just see my last patch applied which simplifies the re-map
code by removing the 2nd loop. Actually, even updating the
present_parts to remove the unneeded subpartitioned table indexes is
optional. We' just need to give up on the elog(ERROR, "partition
missing from subplans"); error and assume that case is fine.

The fact that you added that loop and only later decided it was
unnecessary seems to me to support my point pretty strongly: that
code is overcomplicated.

regards, tom lane

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#32)
Re: why partition pruning doesn't work?

I wrote:

* I'm fairly suspicious of the fmgr_info and fmgr_info_copy calls in
perform_pruning_base_step. Those seem likely to leak memory, and
for sure they destroy any opportunity for the called comparison
function to cache info in fn_extra --- something that's critical
for performance for some comparison functions such as array_eq.
Why is it necessary to suppose that those functions could change
from one execution to the next?

After looking closer, that code isn't just inefficient, it's flat
out broken. The reason is that ExecSetupPartitionPruneState thinks
it can store some pointers into the target relation's relcache entry
in the PartitionPruneContext, and then continue to use those pointers
after closing the relcache entry. Nope, you can't do that.

The only reason this code has appeared to pass buildfarm testing is
that when we do

if (cmpfn != context->partsupfunc[keyno].fn_oid)
fmgr_info(cmpfn, &partsupfunc[keyno]);
else ...

if the relcache entry that context->partsupfunc is pointing into
has been freed (and overwritten thanks to CLOBBER_FREED_MEMORY), then the
function OID comparison generally fails so that we do a fresh fmgr_info
call. In the field it's quite likely that we'd accept and attempt to
use a partially-clobbered FmgrInfo; but it would only happen if a relcache
flush had caused the data to get released, so it could be awhile before
anybody saw it happen, let alone reproduced it enough to figure it out.

It's easy to demonstrate that there's a problem if you instrument this
code to log when the OID comparison fails, and then run the regression
tests with -DRELCACHE_FORCE_RELEASE: you get lots of reports like

2018-06-11 18:01:28.686 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.686 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.686 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.686 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.686 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.707 EDT [16734] LOG: replace partsupfunc 2139062143 with 351
2018-06-11 18:01:28.707 EDT [16734] LOG: replace partsupfunc 2139062143 with 351

showing that context->partsupfunc has been overwritten by
CLOBBER_FREED_MEMORY.

If we had any buildfarm critters running valgrind on
RELCACHE_FORCE_RELEASE or CLOBBER_CACHE_ALWAYS builds, they'd have
detected use of uninitialized memory here ... but I don't think we have
any. (The combination of valgrind and CCA would probably be too slow to
be practical :-(, though maybe somebody with a fast machine could do
the other thing.)

Not sure about a good fix for this. It seems annoying to copy the
rel's whole partkey data structure into query-local storage, but
I'm not sure we have any choice. On the bright side, there might
be an opportunity to get rid of repeated runtime fmgr_info lookups
in cross-type comparison situations.

regards, tom lane

#38Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: why partition pruning doesn't work?

On 06/11/2018 06:24 PM, Tom Lane wrote:

If we had any buildfarm critters running valgrind on
RELCACHE_FORCE_RELEASE or CLOBBER_CACHE_ALWAYS builds, they'd have
detected use of uninitialized memory here ... but I don't think we have
any. (The combination of valgrind and CCA would probably be too slow to
be practical :-(, though maybe somebody with a fast machine could do
the other thing.)

I don't have a fast machine, but I do have a slow machine already
running valgrind and not doing much else :-) Let's see how lousyjack
does with -DRELCACHE_FORCE_RELEASE

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#38)
Re: why partition pruning doesn't work?

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 06/11/2018 06:24 PM, Tom Lane wrote:

If we had any buildfarm critters running valgrind on
RELCACHE_FORCE_RELEASE or CLOBBER_CACHE_ALWAYS builds, they'd have
detected use of uninitialized memory here ... but I don't think we have
any. (The combination of valgrind and CCA would probably be too slow to
be practical :-(, though maybe somebody with a fast machine could do
the other thing.)

I don't have a fast machine, but I do have a slow machine already
running valgrind and not doing much else :-) Let's see how lousyjack
does with -DRELCACHE_FORCE_RELEASE

I just tried the case here, and it doesn't even get as far as any
of the partitioning tests: it bombs out in inherit.sql :-(

==00:00:06:55.816 26107== Invalid read of size 4
==00:00:06:55.816 26107== at 0x5F3978: ATExecDropInherit (tablecmds.c:11928)
==00:00:06:55.816 26107== by 0x60212A: ATExecCmd (tablecmds.c:4241)
==00:00:06:55.816 26107== by 0x602CC4: ATController (tablecmds.c:3976)
==00:00:06:55.816 26107== by 0x7910EA: ProcessUtilitySlow (utility.c:1117)
==00:00:06:55.816 26107== by 0x79180F: standard_ProcessUtility (utility.c:920)
==00:00:06:55.816 26107== by 0x78D748: PortalRunUtility (pquery.c:1178)
==00:00:06:55.816 26107== by 0x78E6D0: PortalRunMulti (pquery.c:1331)
==00:00:06:55.816 26107== by 0x78EF8F: PortalRun (pquery.c:799)
==00:00:06:55.816 26107== by 0x78B35C: exec_simple_query (postgres.c:1122)
==00:00:06:55.816 26107== by 0x78C8B3: PostgresMain (postgres.c:4153)
==00:00:06:55.816 26107== by 0x70FBD5: PostmasterMain (postmaster.c:4361)
==00:00:06:55.816 26107== by 0x67AE4F: main (main.c:228)
==00:00:06:55.816 26107== Address 0xe823e90 is 179,504 bytes inside a recently re-allocated block of size 524,288 alloc'd
==00:00:06:55.816 26107== at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:06:55.816 26107== by 0x8BBB35: AllocSetAlloc (aset.c:923)
==00:00:06:55.816 26107== by 0x8C4473: palloc (mcxt.c:938)
==00:00:06:55.816 26107== by 0x488DEF: CreateTemplateTupleDesc (tupdesc.c:66)
==00:00:06:55.816 26107== by 0x88D2C0: RelationBuildDesc (relcache.c:416)
==00:00:06:55.816 26107== by 0x8904B5: RelationIdGetRelation (relcache.c:1943)
==00:00:06:55.816 26107== by 0x4C93BF: relation_open (heapam.c:1135)
==00:00:06:55.816 26107== by 0x4D8305: index_open (indexam.c:154)
==00:00:06:55.816 26107== by 0x62D6EB: ExecOpenIndices (execIndexing.c:197)
==00:00:06:55.816 26107== by 0x53B607: CatalogOpenIndexes (indexing.c:49)
==00:00:06:55.816 26107== by 0x556467: recordMultipleDependencies (pg_depend.c:112)
==00:00:06:55.816 26107== by 0x560D44: create_toast_table (toasting.c:385)

That one's pretty obvious when you look at the code:

/* keep our lock on the parent relation until commit */
heap_close(parent_rel, NoLock);

ObjectAddressSet(address, RelationRelationId,
RelationGetRelid(parent_rel));

It looks like this might be a fruitful source of creepie-crawlies.

regards, tom lane

#40David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#37)
1 attachment(s)
Re: why partition pruning doesn't work?

On 12 June 2018 at 10:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After looking closer, that code isn't just inefficient, it's flat
out broken. The reason is that ExecSetupPartitionPruneState thinks
it can store some pointers into the target relation's relcache entry
in the PartitionPruneContext, and then continue to use those pointers
after closing the relcache entry. Nope, you can't do that.

I think the best fix is to just have a separate FmgrInfo for each step
and partkey comparison. Some FmgrInfos will end up identical, but
that's probably a small price to pay. Perhaps they should be separate
anyway so that the fn_extra is not shared between different quals
comparing to the same partition key?

I went with an array of FmgrInfos rather than an array of pointers to
FmgrInfos for cache efficiency. This does require that InvalidOid is
0, since I've palloc0'd that memory, and I'm checking if the cache is
yet to be populated with: if
(!OidIsValid(context->stepcmpfuncs[stateidx].fn_oid))

Patch attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

fix_bogus_fmgrinfo_initialisation.patchapplication/octet-stream; name=fix_bogus_fmgrinfo_initialisation.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33513ff1d1..5b74733804 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1463,19 +1463,29 @@ ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 
 		partkey = RelationGetPartitionKey(rel);
 		partdesc = RelationGetPartitionDesc(rel);
+		n_steps = list_length(pinfo->pruning_steps);
 
 		context->strategy = partkey->strategy;
 		context->partnatts = partnatts = partkey->partnatts;
-		context->partopfamily = partkey->partopfamily;
-		context->partopcintype = partkey->partopcintype;
-		context->partcollation = partkey->partcollation;
-		context->partsupfunc = partkey->partsupfunc;
+
+		context->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+		memcpy(context->partopfamily, partkey->partopfamily, sizeof(Oid) * partnatts);
+
+		context->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+		memcpy(context->partopcintype, partkey->partopcintype, sizeof(Oid) * partnatts);
+
+		context->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+		memcpy(context->partcollation, partkey->partcollation, sizeof(Oid) * partnatts);
+
+		context->stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) *
+													 n_steps *
+													 partnatts);
+
 		context->nparts = pinfo->nparts;
 		context->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
 		context->planstate = planstate;
 
 		/* Initialize expression state for each expression we need */
-		n_steps = list_length(pinfo->pruning_steps);
 		context->exprstates = (ExprState **)
 			palloc0(sizeof(ExprState *) * n_steps * partnatts);
 		foreach(lc2, pinfo->pruning_steps)
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 856bdd3a14..44be58f7ec 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -441,7 +441,9 @@ prune_append_rel_partitions(RelOptInfo *rel)
 	context.partopfamily = rel->part_scheme->partopfamily;
 	context.partopcintype = rel->part_scheme->partopcintype;
 	context.partcollation = rel->part_scheme->partcollation;
-	context.partsupfunc = rel->part_scheme->partsupfunc;
+	context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) *
+												context.partnatts *
+												list_length(pruning_steps));
 	context.nparts = rel->nparts;
 	context.boundinfo = rel->boundinfo;
 
@@ -2809,7 +2811,8 @@ perform_pruning_base_step(PartitionPruneContext *context,
 	int			keyno,
 				nvalues;
 	Datum		values[PARTITION_MAX_KEYS];
-	FmgrInfo	partsupfunc[PARTITION_MAX_KEYS];
+	FmgrInfo	*partsupfunc;
+	int			stateidx;
 
 	/*
 	 * There better be the same number of expressions and compare functions.
@@ -2844,7 +2847,6 @@ perform_pruning_base_step(PartitionPruneContext *context,
 		if (lc1 != NULL)
 		{
 			Expr	   *expr;
-			int			stateidx;
 			Datum		datum;
 			bool		isnull;
 
@@ -2873,19 +2875,12 @@ perform_pruning_base_step(PartitionPruneContext *context,
 					return result;
 				}
 
-				/*
-				 * If we're going to need a different comparison function than
-				 * the one cached in the PartitionKey, we'll need to look up
-				 * the FmgrInfo.
-				 */
 				cmpfn = lfirst_oid(lc2);
 				Assert(OidIsValid(cmpfn));
-				if (cmpfn != context->partsupfunc[keyno].fn_oid)
-					fmgr_info(cmpfn, &partsupfunc[keyno]);
-				else
-					fmgr_info_copy(&partsupfunc[keyno],
-								   &context->partsupfunc[keyno],
-								   CurrentMemoryContext);
+
+				/* Check if we've cached the FmgrInfo yet */
+				if (!OidIsValid(context->stepcmpfuncs[stateidx].fn_oid))
+					fmgr_info(cmpfn, &context->stepcmpfuncs[stateidx]);
 
 				values[keyno] = datum;
 				nvalues++;
@@ -2896,6 +2891,14 @@ perform_pruning_base_step(PartitionPruneContext *context,
 		}
 	}
 
+	/*
+	 * Determine the stateidx for the 0th key and point the partsupfunc to
+	 * that element. This provides the correct array segment for the
+	 * strategy matching function below.
+	 */
+	stateidx = PruneCxtStateIdx(context->partnatts, opstep->step.step_id, 0);
+	partsupfunc = &context->stepcmpfuncs[stateidx];
+
 	switch (context->strategy)
 	{
 		case PARTITION_STRATEGY_HASH:
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index e3b3bfb7c1..b275c9375e 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -18,51 +18,56 @@
 #include "nodes/relation.h"
 
 
-/*
+/*-----------------------
  * PartitionPruneContext
+ *		Stores information to allow partition pruning on a single partitioned
+ *		table.
  *
- * Information about a partitioned table needed to perform partition pruning.
+ * strategy			Partition strategy, e.g. LIST, RANGE, HASH.
+ * partnatts		Number of attributes and exprs that make up the partition
+ *					key.
+ * partopfamily		Array of partnatts elements storing the opfamily of the
+ *					corresponding partition key element.
+ * partopcintype	Array of partnatts elements storing the Oid of opclass
+ *					if the corresponding partition key element.
+ * partcollation	Array of partnatts elements storing the collation of the
+ *					corresponding partition key element.
+ * stepcmpfuncs		An array to store FmrgInfo for each pruning step partition
+ *					key pair. The array should be indexed by PruneCtxStateIdx.
+ * nparts			Number of partitions belonging to this partitioned table.
+ * boundinfo		PartitionBoundInfo for the partitioned table.
+ * planstate		Holds the executor's planstate when being called during
+ *					execution, or NULL when being called from the planner.
+ * exprstates		Array of ExprStates, indexed as per PruneCtxStateIdx; one
+ *					for each partkey in each pruning step.  Allocated if
+ *					planstate is non-NULL, otherwise NULL.
+ * exprhasexecparam	Array of bools, each true if corresponding 'exprstate'
+ *					expression contains any PARAM_EXEC Params.  (Can be NULL
+ *					if planstate is NULL.)
+ * evalexecparams	True if it's safe to evaluate PARAM_EXEC Params.
+ *-----------------------
  */
 typedef struct PartitionPruneContext
 {
-	/* Partition key information */
 	char		strategy;
 	int			partnatts;
 	Oid		   *partopfamily;
 	Oid		   *partopcintype;
 	Oid		   *partcollation;
-	FmgrInfo   *partsupfunc;
-
-	/* Number of partitions */
+	FmgrInfo   *stepcmpfuncs;
 	int			nparts;
-
-	/* Partition boundary info */
 	PartitionBoundInfo boundinfo;
-
-	/*
-	 * This will be set when the context is used from the executor, to allow
-	 * Params to be evaluated.
-	 */
 	PlanState  *planstate;
-
-	/*
-	 * Array of ExprStates, indexed as per PruneCtxStateIdx; one for each
-	 * partkey in each pruning step.  Allocated if planstate is non-NULL,
-	 * otherwise NULL.
-	 */
 	ExprState **exprstates;
-
-	/*
-	 * Similar array of flags, each true if corresponding 'exprstate'
-	 * expression contains any PARAM_EXEC Params.  (Can be NULL if planstate
-	 * is NULL.)
-	 */
 	bool	   *exprhasexecparam;
-
-	/* true if it's safe to evaluate PARAM_EXEC Params */
 	bool		evalexecparams;
 } PartitionPruneContext;
 
+/*
+ * Determine a unique index into a 2-dimentional array based on the 3 inputs,
+ * where partnatts is the maximum possible value for keyno.  Consecutive
+ * keynos are consecutive array elements.
+ */
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
 	((partnatts) * (step_id) + (keyno))
 
#41Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Andrew Dunstan (#38)
Re: why partition pruning doesn't work?

On 06/11/2018 06:41 PM, Andrew Dunstan wrote:

On 06/11/2018 06:24 PM, Tom Lane wrote:

If we had any buildfarm critters running valgrind on
RELCACHE_FORCE_RELEASE or CLOBBER_CACHE_ALWAYS builds, they'd have
detected use of uninitialized memory here ... but I don't think we have
any.  (The combination of valgrind and CCA would probably be too slow to
be practical :-(, though maybe somebody with a fast machine could do
the other thing.)

I don't have a fast machine, but I do have a slow machine already
running valgrind and not doing much else :-) Let's see how lousyjack
does with -DRELCACHE_FORCE_RELEASE

It added about 20% to the run time. That's tolerable, so I'll leave it on.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#42Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#37)
Re: why partition pruning doesn't work?

On Tue, Jun 12, 2018 at 3:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not sure about a good fix for this. It seems annoying to copy the
rel's whole partkey data structure into query-local storage, but
I'm not sure we have any choice. On the bright side, there might
be an opportunity to get rid of repeated runtime fmgr_info lookups
in cross-type comparison situations.

We already do that while building part_scheme. So, if we are in
planner, it's readily available through RelOptInfo. If we need it in
the executor, we need to pass it down from RelOptInfo into one of the
execution states. I haven't looked at the patch to exactly figure out
which of these is true.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#37)
Re: why partition pruning doesn't work?

On Mon, Jun 11, 2018 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not sure about a good fix for this. It seems annoying to copy the
rel's whole partkey data structure into query-local storage, but
I'm not sure we have any choice. On the bright side, there might
be an opportunity to get rid of repeated runtime fmgr_info lookups
in cross-type comparison situations.

Is this the same issue I raised in
/messages/by-id/CA+TgmoYKToP4-adCFFRNrO21OGuH=phx-fiB1dYoqksNYX6YHQ@mail.gmail.com
or a similar issue that creeps up at execution time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#43)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 11, 2018 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not sure about a good fix for this. It seems annoying to copy the
rel's whole partkey data structure into query-local storage, but
I'm not sure we have any choice. On the bright side, there might
be an opportunity to get rid of repeated runtime fmgr_info lookups
in cross-type comparison situations.

Is this the same issue I raised in
/messages/by-id/CA+TgmoYKToP4-adCFFRNrO21OGuH=phx-fiB1dYoqksNYX6YHQ@mail.gmail.com
or a similar issue that creeps up at execution time?

Well, it's related to that: *if* we held the relcache entry open for
the duration of the query, and *if* holding such a pin were sufficient
to guarantee the contents of the entry's partition data couldn't change
or even move, then we could avoid doing so much copying. But as we
discussed then, neither condition is true, and I don't think either one is
cheap to make true. Certainly there's no logic in the relcache to detect
changes of partition data like we do for, say, triggers.

regards, tom lane

#45Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#44)
Re: why partition pruning doesn't work?

On Tue, Jun 12, 2018 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 11, 2018 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not sure about a good fix for this. It seems annoying to copy the
rel's whole partkey data structure into query-local storage, but
I'm not sure we have any choice. On the bright side, there might
be an opportunity to get rid of repeated runtime fmgr_info lookups
in cross-type comparison situations.

Is this the same issue I raised in
/messages/by-id/CA+TgmoYKToP4-adCFFRNrO21OGuH=phx-fiB1dYoqksNYX6YHQ@mail.gmail.com
or a similar issue that creeps up at execution time?

Well, it's related to that: *if* we held the relcache entry open for
the duration of the query, and *if* holding such a pin were sufficient
to guarantee the contents of the entry's partition data couldn't change
or even move, then we could avoid doing so much copying. But as we
discussed then, neither condition is true, and I don't think either one is
cheap to make true. Certainly there's no logic in the relcache to detect
changes of partition data like we do for, say, triggers.

I think we DO hold relations open for the duration of execution
(though not necessarily between planning and execution). And there is
code in RelationClearRelation to avoid changing rd_partkey and
rd_partdesc if no logical change has occurred.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#45)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

I think we DO hold relations open for the duration of execution
(though not necessarily between planning and execution). And there is
code in RelationClearRelation to avoid changing rd_partkey and
rd_partdesc if no logical change has occurred.

Testing with valgrind + RELCACHE_FORCE_RELEASE is sufficient to disprove
that, cf current results from lousyjack (which match my own testing).
The partkey *is* disappearing under us.

While I've not looked into the exact reasons for that, my first guess
is that the partitioned table is not held open because it's not one
of the ones to be scanned. Are you prepared to change something like
that at this stage of the release cycle?

regards, tom lane

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#45)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

And there is
code in RelationClearRelation to avoid changing rd_partkey and
rd_partdesc if no logical change has occurred.

Oh, and by the way, what's actually in there is

keep_partkey = (relation->rd_partkey != NULL);

I would be interested to see an explanation of why that isn't utterly
broken.

regards, tom lane

#48Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#46)
Re: why partition pruning doesn't work?

On Tue, Jun 12, 2018 at 12:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Testing with valgrind + RELCACHE_FORCE_RELEASE is sufficient to disprove
that, cf current results from lousyjack (which match my own testing).
The partkey *is* disappearing under us.

While I've not looked into the exact reasons for that, my first guess
is that the partitioned table is not held open because it's not one
of the ones to be scanned. Are you prepared to change something like
that at this stage of the release cycle?

The partition key is immutable, so it should NOT be able to disappear
out from under us. Once you have defined the partitioning strategy
for a table and the partitioning keys associated with it, you can't
ever change it. The only reason we need keep_partkey at all, as
opposed to just assume that the relevant portion of the relcache entry
can't change at all, is because during relation creation we are
briefly in a state where the pg_class row exists and the
pg_partitioned_table row hasn't been set up yet. So I think your
guess that the relation is not kept open is likely to be correct.

As for whether to change it at this point in the release cycle, I
guess that, to me, depends on how invasive the fix is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#48)
1 attachment(s)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 12, 2018 at 12:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

While I've not looked into the exact reasons for that, my first guess
is that the partitioned table is not held open because it's not one
of the ones to be scanned. Are you prepared to change something like
that at this stage of the release cycle?

The partition key is immutable, so it should NOT be able to disappear
out from under us.

Hm. That could be better documented.

As for whether to change it at this point in the release cycle, I
guess that, to me, depends on how invasive the fix is.

It seems not to be that bad: we just need a shutdown call for the
PartitionPruneState, and then we can remember the open relation there.
The attached is based on David's patch from yesterday.

I'm still a bit annoyed at the fmgr_info_copy calls in this. It'd be
better to use the FmgrInfos in the relcache when applicable. However,
mixing those with the cross-type ones would seem to require that we change
the API for get_matching_hash_bounds et al from taking "FmgrInfo *" to
taking "FmgrInfo **", which looks rather invasive.

regards, tom lane

Attachments:

fix_unsafe_relcache_partition_data_use.patchtext/x-diff; charset=us-ascii; name=fix_unsafe_relcache_partition_data_use.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33513ff..4eeee7c 100644
*** a/src/backend/executor/execPartition.c
--- b/src/backend/executor/execPartition.c
*************** adjust_partition_tlist(List *tlist, Tupl
*** 1357,1367 ****
   *
   * Functions:
   *
!  * ExecSetupPartitionPruneState:
   *		Creates the PartitionPruneState required by each of the two pruning
   *		functions.  Details stored include how to map the partition index
   *		returned by the partition pruning code into subplan indexes.
   *
   * ExecFindInitialMatchingSubPlans:
   *		Returns indexes of matching subplans.  Partition pruning is attempted
   *		without any evaluation of expressions containing PARAM_EXEC Params.
--- 1357,1370 ----
   *
   * Functions:
   *
!  * ExecCreatePartitionPruneState:
   *		Creates the PartitionPruneState required by each of the two pruning
   *		functions.  Details stored include how to map the partition index
   *		returned by the partition pruning code into subplan indexes.
   *
+  * ExecDestroyPartitionPruneState:
+  *		Deletes a PartitionPruneState. Must be called during executor shutdown.
+  *
   * ExecFindInitialMatchingSubPlans:
   *		Returns indexes of matching subplans.  Partition pruning is attempted
   *		without any evaluation of expressions containing PARAM_EXEC Params.
*************** adjust_partition_tlist(List *tlist, Tupl
*** 1382,1389 ****
   */
  
  /*
!  * ExecSetupPartitionPruneState
!  *		Set up the data structure required for calling
   *		ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
   *
   * 'planstate' is the parent plan node's execution state.
--- 1385,1392 ----
   */
  
  /*
!  * ExecCreatePartitionPruneState
!  *		Build the data structure required for calling
   *		ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
   *
   * 'planstate' is the parent plan node's execution state.
*************** adjust_partition_tlist(List *tlist, Tupl
*** 1395,1401 ****
   * in each PartitionPruneInfo.
   */
  PartitionPruneState *
! ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
  {
  	PartitionPruneState *prunestate;
  	PartitionPruningData *prunedata;
--- 1398,1404 ----
   * in each PartitionPruneInfo.
   */
  PartitionPruneState *
! ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
  {
  	PartitionPruneState *prunestate;
  	PartitionPruningData *prunedata;
*************** ExecSetupPartitionPruneState(PlanState *
*** 1435,1445 ****
  		PartitionPruningData *pprune = &prunedata[i];
  		PartitionPruneContext *context = &pprune->context;
  		PartitionDesc partdesc;
- 		Relation	rel;
  		PartitionKey partkey;
- 		ListCell   *lc2;
  		int			partnatts;
  		int			n_steps;
  
  		/*
  		 * We must copy the subplan_map rather than pointing directly to the
--- 1438,1447 ----
  		PartitionPruningData *pprune = &prunedata[i];
  		PartitionPruneContext *context = &pprune->context;
  		PartitionDesc partdesc;
  		PartitionKey partkey;
  		int			partnatts;
  		int			n_steps;
+ 		ListCell   *lc2;
  
  		/*
  		 * We must copy the subplan_map rather than pointing directly to the
*************** ExecSetupPartitionPruneState(PlanState *
*** 1456,1481 ****
  		pprune->present_parts = bms_copy(pinfo->present_parts);
  
  		/*
! 		 * Grab some info from the table's relcache; lock was already obtained
! 		 * by ExecLockNonLeafAppendTables.
  		 */
! 		rel = relation_open(pinfo->reloid, NoLock);
  
! 		partkey = RelationGetPartitionKey(rel);
! 		partdesc = RelationGetPartitionDesc(rel);
  
  		context->strategy = partkey->strategy;
  		context->partnatts = partnatts = partkey->partnatts;
! 		context->partopfamily = partkey->partopfamily;
! 		context->partopcintype = partkey->partopcintype;
  		context->partcollation = partkey->partcollation;
  		context->partsupfunc = partkey->partsupfunc;
! 		context->nparts = pinfo->nparts;
! 		context->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
  		context->planstate = planstate;
  
  		/* Initialize expression state for each expression we need */
- 		n_steps = list_length(pinfo->pruning_steps);
  		context->exprstates = (ExprState **)
  			palloc0(sizeof(ExprState *) * n_steps * partnatts);
  		foreach(lc2, pinfo->pruning_steps)
--- 1458,1490 ----
  		pprune->present_parts = bms_copy(pinfo->present_parts);
  
  		/*
! 		 * We need to hold a pin on the partitioned table's relcache entry so
! 		 * that we can rely on its copies of the table's partition key and
! 		 * partition descriptor.  We need not get a lock though; one should
! 		 * have been acquired already by InitPlan or
! 		 * ExecLockNonLeafAppendTables.
  		 */
! 		context->partrel = relation_open(pinfo->reloid, NoLock);
  
! 		partkey = RelationGetPartitionKey(context->partrel);
! 		partdesc = RelationGetPartitionDesc(context->partrel);
! 		n_steps = list_length(pinfo->pruning_steps);
  
  		context->strategy = partkey->strategy;
  		context->partnatts = partnatts = partkey->partnatts;
! 		context->nparts = pinfo->nparts;
! 		context->boundinfo = partdesc->boundinfo;
  		context->partcollation = partkey->partcollation;
  		context->partsupfunc = partkey->partsupfunc;
! 
! 		/* We'll look up type-specific support functions as needed */
! 		context->stepcmpfuncs = (FmgrInfo *)
! 			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
! 
! 		context->ppccontext = CurrentMemoryContext;
  		context->planstate = planstate;
  
  		/* Initialize expression state for each expression we need */
  		context->exprstates = (ExprState **)
  			palloc0(sizeof(ExprState *) * n_steps * partnatts);
  		foreach(lc2, pinfo->pruning_steps)
*************** ExecSetupPartitionPruneState(PlanState *
*** 1527,1534 ****
  		prunestate->execparamids = bms_add_members(prunestate->execparamids,
  												   pinfo->execparamids);
  
- 		relation_close(rel, NoLock);
- 
  		i++;
  	}
  
--- 1536,1541 ----
*************** ExecSetupPartitionPruneState(PlanState *
*** 1536,1541 ****
--- 1543,1568 ----
  }
  
  /*
+  * ExecDestroyPartitionPruneState
+  *		Release resources at plan shutdown.
+  *
+  * We don't bother to free any memory here, since the whole executor context
+  * will be going away shortly.  We do need to release our relcache pins.
+  */
+ void
+ ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
+ {
+ 	int			i;
+ 
+ 	for (i = 0; i < prunestate->num_partprunedata; i++)
+ 	{
+ 		PartitionPruningData *pprune = &prunestate->partprunedata[i];
+ 
+ 		relation_close(pprune->context.partrel, NoLock);
+ 	}
+ }
+ 
+ /*
   * ExecFindInitialMatchingSubPlans
   *		Identify the set of subplans that cannot be eliminated by initial
   *		pruning (disregarding any pruning constraints involving PARAM_EXEC
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6dd53e9..5ce4fb4 100644
*** a/src/backend/executor/nodeAppend.c
--- b/src/backend/executor/nodeAppend.c
*************** ExecInitAppend(Append *node, EState *est
*** 136,143 ****
  		/* We may need an expression context to evaluate partition exprs */
  		ExecAssignExprContext(estate, &appendstate->ps);
  
! 		prunestate = ExecSetupPartitionPruneState(&appendstate->ps,
! 												  node->part_prune_infos);
  
  		/* Perform an initial partition prune, if required. */
  		if (prunestate->do_initial_prune)
--- 136,145 ----
  		/* We may need an expression context to evaluate partition exprs */
  		ExecAssignExprContext(estate, &appendstate->ps);
  
! 		/* Create the working data structure for pruning. */
! 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
! 												   node->part_prune_infos);
! 		appendstate->as_prune_state = prunestate;
  
  		/* Perform an initial partition prune, if required. */
  		if (prunestate->do_initial_prune)
*************** ExecInitAppend(Append *node, EState *est
*** 178,185 ****
  		 */
  		if (!prunestate->do_exec_prune)
  			appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
- 
- 		appendstate->as_prune_state = prunestate;
  	}
  	else
  	{
--- 180,185 ----
*************** ExecEndAppend(AppendState *node)
*** 330,335 ****
--- 330,341 ----
  	 */
  	for (i = 0; i < nplans; i++)
  		ExecEndNode(appendplans[i]);
+ 
+ 	/*
+ 	 * release any resources associated with run-time pruning
+ 	 */
+ 	if (node->as_prune_state)
+ 		ExecDestroyPartitionPruneState(node->as_prune_state);
  }
  
  void
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 856bdd3..ee5826e 100644
*** a/src/backend/partitioning/partprune.c
--- b/src/backend/partitioning/partprune.c
*************** prune_append_rel_partitions(RelOptInfo *
*** 436,449 ****
  	if (contradictory)
  		return NULL;
  
  	context.strategy = rel->part_scheme->strategy;
  	context.partnatts = rel->part_scheme->partnatts;
- 	context.partopfamily = rel->part_scheme->partopfamily;
- 	context.partopcintype = rel->part_scheme->partopcintype;
- 	context.partcollation = rel->part_scheme->partcollation;
- 	context.partsupfunc = rel->part_scheme->partsupfunc;
  	context.nparts = rel->nparts;
  	context.boundinfo = rel->boundinfo;
  
  	/* These are not valid when being called from the planner */
  	context.planstate = NULL;
--- 436,453 ----
  	if (contradictory)
  		return NULL;
  
+ 	/* Set up PartitionPruneContext */
+ 	context.partrel = NULL;
  	context.strategy = rel->part_scheme->strategy;
  	context.partnatts = rel->part_scheme->partnatts;
  	context.nparts = rel->nparts;
  	context.boundinfo = rel->boundinfo;
+ 	context.partcollation = rel->part_scheme->partcollation;
+ 	context.partsupfunc = rel->part_scheme->partsupfunc;
+ 	context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) *
+ 												context.partnatts *
+ 												list_length(pruning_steps));
+ 	context.ppccontext = CurrentMemoryContext;
  
  	/* These are not valid when being called from the planner */
  	context.planstate = NULL;
*************** perform_pruning_base_step(PartitionPrune
*** 2809,2815 ****
  	int			keyno,
  				nvalues;
  	Datum		values[PARTITION_MAX_KEYS];
! 	FmgrInfo	partsupfunc[PARTITION_MAX_KEYS];
  
  	/*
  	 * There better be the same number of expressions and compare functions.
--- 2813,2820 ----
  	int			keyno,
  				nvalues;
  	Datum		values[PARTITION_MAX_KEYS];
! 	FmgrInfo   *partsupfunc;
! 	int			stateidx;
  
  	/*
  	 * There better be the same number of expressions and compare functions.
*************** perform_pruning_base_step(PartitionPrune
*** 2844,2850 ****
  		if (lc1 != NULL)
  		{
  			Expr	   *expr;
- 			int			stateidx;
  			Datum		datum;
  			bool		isnull;
  
--- 2849,2854 ----
*************** perform_pruning_base_step(PartitionPrune
*** 2873,2891 ****
  					return result;
  				}
  
! 				/*
! 				 * If we're going to need a different comparison function than
! 				 * the one cached in the PartitionKey, we'll need to look up
! 				 * the FmgrInfo.
! 				 */
  				cmpfn = lfirst_oid(lc2);
  				Assert(OidIsValid(cmpfn));
! 				if (cmpfn != context->partsupfunc[keyno].fn_oid)
! 					fmgr_info(cmpfn, &partsupfunc[keyno]);
! 				else
! 					fmgr_info_copy(&partsupfunc[keyno],
! 								   &context->partsupfunc[keyno],
! 								   CurrentMemoryContext);
  
  				values[keyno] = datum;
  				nvalues++;
--- 2877,2901 ----
  					return result;
  				}
  
! 				/* Set up the stepcmpfuncs entry, unless we already did */
  				cmpfn = lfirst_oid(lc2);
  				Assert(OidIsValid(cmpfn));
! 				if (cmpfn != context->stepcmpfuncs[stateidx].fn_oid)
! 				{
! 					/*
! 					 * If the needed support function is the same one cached
! 					 * in the relation's partition key, copy the cached
! 					 * FmgrInfo.  Otherwise (i.e., when we have a cross-type
! 					 * comparison), an actual lookup is required.
! 					 */
! 					if (cmpfn == context->partsupfunc[keyno].fn_oid)
! 						fmgr_info_copy(&context->stepcmpfuncs[stateidx],
! 									   &context->partsupfunc[keyno],
! 									   context->ppccontext);
! 					else
! 						fmgr_info_cxt(cmpfn, &context->stepcmpfuncs[stateidx],
! 									  context->ppccontext);
! 				}
  
  				values[keyno] = datum;
  				nvalues++;
*************** perform_pruning_base_step(PartitionPrune
*** 2896,2901 ****
--- 2906,2918 ----
  		}
  	}
  
+ 	/*
+ 	 * Point partsupfunc to the entry for the 0th key of this step; the
+ 	 * additional support functions, if any, follow consecutively.
+ 	 */
+ 	stateidx = PruneCxtStateIdx(context->partnatts, opstep->step.step_id, 0);
+ 	partsupfunc = &context->stepcmpfuncs[stateidx];
+ 
  	switch (context->strategy)
  	{
  		case PARTITION_STRATEGY_HASH:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3dfb1b8..dc69280 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationClearRelation(Relation relation,
*** 2401,2406 ****
--- 2401,2407 ----
  		keep_tupdesc = equalTupleDescs(relation->rd_att, newrel->rd_att);
  		keep_rules = equalRuleLocks(relation->rd_rules, newrel->rd_rules);
  		keep_policies = equalRSDesc(relation->rd_rsdesc, newrel->rd_rsdesc);
+ 		/* partkey is immutable once set up, so we can always keep it */
  		keep_partkey = (relation->rd_partkey != NULL);
  		keep_partdesc = equalPartitionDescs(relation->rd_partkey,
  											relation->rd_partdesc,
*************** RelationClearRelation(Relation relation,
*** 2445,2451 ****
  		SWAPFIELD(Form_pg_class, rd_rel);
  		/* ... but actually, we don't have to update newrel->rd_rel */
  		memcpy(relation->rd_rel, newrel->rd_rel, CLASS_TUPLE_SIZE);
! 		/* preserve old tupledesc and rules if no logical change */
  		if (keep_tupdesc)
  			SWAPFIELD(TupleDesc, rd_att);
  		if (keep_rules)
--- 2446,2452 ----
  		SWAPFIELD(Form_pg_class, rd_rel);
  		/* ... but actually, we don't have to update newrel->rd_rel */
  		memcpy(relation->rd_rel, newrel->rd_rel, CLASS_TUPLE_SIZE);
! 		/* preserve old tupledesc, rules, policies if no logical change */
  		if (keep_tupdesc)
  			SWAPFIELD(TupleDesc, rd_att);
  		if (keep_rules)
*************** RelationClearRelation(Relation relation,
*** 2459,2471 ****
  		SWAPFIELD(Oid, rd_toastoid);
  		/* pgstat_info must be preserved */
  		SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
! 		/* partition key must be preserved, if we have one */
  		if (keep_partkey)
  		{
  			SWAPFIELD(PartitionKey, rd_partkey);
  			SWAPFIELD(MemoryContext, rd_partkeycxt);
  		}
- 		/* preserve old partdesc if no logical change */
  		if (keep_partdesc)
  		{
  			SWAPFIELD(PartitionDesc, rd_partdesc);
--- 2460,2471 ----
  		SWAPFIELD(Oid, rd_toastoid);
  		/* pgstat_info must be preserved */
  		SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
! 		/* preserve old partitioning info if no logical change */
  		if (keep_partkey)
  		{
  			SWAPFIELD(PartitionKey, rd_partkey);
  			SWAPFIELD(MemoryContext, rd_partkeycxt);
  		}
  		if (keep_partdesc)
  		{
  			SWAPFIELD(PartitionDesc, rd_partdesc);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 71d639f..862bf65 100644
*** a/src/include/executor/execPartition.h
--- b/src/include/executor/execPartition.h
*************** extern HeapTuple ConvertPartitionTupleSl
*** 208,215 ****
  						  TupleTableSlot **p_my_slot);
  extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
  						PartitionTupleRouting *proute);
! extern PartitionPruneState *ExecSetupPartitionPruneState(PlanState *planstate,
! 							 List *partitionpruneinfo);
  extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
  extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
  								int nsubplans);
--- 208,216 ----
  						  TupleTableSlot **p_my_slot);
  extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
  						PartitionTupleRouting *proute);
! extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
! 							  List *partitionpruneinfo);
! extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
  extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
  extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
  								int nsubplans);
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index e3b3bfb..09147b5 100644
*** a/src/include/partitioning/partprune.h
--- b/src/include/partitioning/partprune.h
***************
*** 20,68 ****
  
  /*
   * PartitionPruneContext
   *
!  * Information about a partitioned table needed to perform partition pruning.
   */
  typedef struct PartitionPruneContext
  {
! 	/* Partition key information */
  	char		strategy;
  	int			partnatts;
- 	Oid		   *partopfamily;
- 	Oid		   *partopcintype;
- 	Oid		   *partcollation;
- 	FmgrInfo   *partsupfunc;
- 
- 	/* Number of partitions */
  	int			nparts;
- 
- 	/* Partition boundary info */
  	PartitionBoundInfo boundinfo;
! 
! 	/*
! 	 * This will be set when the context is used from the executor, to allow
! 	 * Params to be evaluated.
! 	 */
  	PlanState  *planstate;
- 
- 	/*
- 	 * Array of ExprStates, indexed as per PruneCtxStateIdx; one for each
- 	 * partkey in each pruning step.  Allocated if planstate is non-NULL,
- 	 * otherwise NULL.
- 	 */
  	ExprState **exprstates;
- 
- 	/*
- 	 * Similar array of flags, each true if corresponding 'exprstate'
- 	 * expression contains any PARAM_EXEC Params.  (Can be NULL if planstate
- 	 * is NULL.)
- 	 */
  	bool	   *exprhasexecparam;
- 
- 	/* true if it's safe to evaluate PARAM_EXEC Params */
  	bool		evalexecparams;
  } PartitionPruneContext;
  
  #define PruneCxtStateIdx(partnatts, step_id, keyno) \
  	((partnatts) * (step_id) + (keyno))
  
--- 20,76 ----
  
  /*
   * PartitionPruneContext
+  *		Stores information needed at runtime for pruning computations
+  *		related to a single partitioned table.
   *
!  * partrel			Relcache pointer for the partitioned table,
!  *					if we have it open (else NULL).
!  * strategy			Partition strategy, e.g. LIST, RANGE, HASH.
!  * partnatts		Number of columns in the partition key.
!  * nparts			Number of partitions in this partitioned table.
!  * boundinfo		Partition boundary info for the partitioned table.
!  * partcollation	Array of partnatts elements, storing the collations of the
!  *					partition key columns.
!  * partsupfunc		Array of FmgrInfos for the comparison or hashing functions
!  *					associated with the partition keys (partnatts elements).
!  *					(This points into the partrel's partition key, typically.)
!  * stepcmpfuncs		Array of FmgrInfos for the comparison or hashing function
!  *					for each pruning step and partition key.
!  * ppccontext		Memory context holding this PartitionPruneContext's
!  *					subsidiary data, such as the FmgrInfos.
!  * planstate		Points to the parent plan node's PlanState when called
!  *					during execution; NULL when called from the planner.
!  * exprstates		Array of ExprStates, indexed as per PruneCtxStateIdx; one
!  *					for each partition key in each pruning step.  Allocated if
!  *					planstate is non-NULL, otherwise NULL.
!  * exprhasexecparam	Array of bools, each true if corresponding 'exprstate'
!  *					expression contains any PARAM_EXEC Params.  (Can be NULL
!  *					if planstate is NULL.)
!  * evalexecparams	True if it's safe to evaluate PARAM_EXEC Params.
   */
  typedef struct PartitionPruneContext
  {
! 	Relation	partrel;
  	char		strategy;
  	int			partnatts;
  	int			nparts;
  	PartitionBoundInfo boundinfo;
! 	Oid		   *partcollation;
! 	FmgrInfo   *partsupfunc;
! 	FmgrInfo   *stepcmpfuncs;
! 	MemoryContext ppccontext;
  	PlanState  *planstate;
  	ExprState **exprstates;
  	bool	   *exprhasexecparam;
  	bool		evalexecparams;
  } PartitionPruneContext;
  
+ /*
+  * PruneCxtStateIdx() computes the correct index into the stepcmpfuncs[],
+  * exprstates[] and exprhasexecparam[] arrays for step step_id and
+  * partition key column keyno.  (Note: there is code that assumes the
+  * entries for a given step are sequential, so this is not chosen freely.)
+  */
  #define PruneCxtStateIdx(partnatts, step_id, keyno) \
  	((partnatts) * (step_id) + (keyno))
  
#50David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#49)
Re: why partition pruning doesn't work?

On 13 June 2018 at 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems not to be that bad: we just need a shutdown call for the
PartitionPruneState, and then we can remember the open relation there.
The attached is based on David's patch from yesterday.

I'm still a bit annoyed at the fmgr_info_copy calls in this. It'd be
better to use the FmgrInfos in the relcache when applicable. However,
mixing those with the cross-type ones would seem to require that we change
the API for get_matching_hash_bounds et al from taking "FmgrInfo *" to
taking "FmgrInfo **", which looks rather invasive.

I've looked over this and it seems better than mine. Especially so
that you've done things so that the FmgrInfo is copied into a memory
context that's not about to get reset.

One small thing is that I'd move the:

context.partrel = NULL;

to under:

/* These are not valid when being called from the planner */

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#51Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#49)
Re: why partition pruning doesn't work?

On Wed, Jun 13, 2018 at 12:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As for whether to change it at this point in the release cycle, I
guess that, to me, depends on how invasive the fix is.

It seems not to be that bad: we just need a shutdown call for the
PartitionPruneState, and then we can remember the open relation there.
The attached is based on David's patch from yesterday.

Seems reasonable. Really, I think we should look for a way to hang
onto the relation at the point where it's originally opened and locked
instead of reopening it here. But that's probably more invasive than
we can really justify right at the moment, and I think this is a step
in a good direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#51)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

Seems reasonable. Really, I think we should look for a way to hang
onto the relation at the point where it's originally opened and locked
instead of reopening it here. But that's probably more invasive than
we can really justify right at the moment, and I think this is a step
in a good direction.

The existing coding there makes me itch a bit, because there's only a
rather fragile line of reasoning justifying the assumption that there is a
pre-existing lock at all. So I'd be in favor of what you suggest just to
get rid of the "open(NoLock)" hazard. But I agree that it'd be rather
invasive and right now is probably not the time for it.

regards, tom lane

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#50)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 13 June 2018 at 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems not to be that bad: we just need a shutdown call for the
PartitionPruneState, and then we can remember the open relation there.
The attached is based on David's patch from yesterday.

I've looked over this and it seems better than mine. Especially so
that you've done things so that the FmgrInfo is copied into a memory
context that's not about to get reset.

Pushed, thanks for looking it over.

One small thing is that I'd move the:
context.partrel = NULL;
to under:
/* These are not valid when being called from the planner */

Judgment call I guess. I tend to initialize struct fields in field order
unless there's a good reason to do differently, just so it's easier to
confirm that none were overlooked. But I can see the point of your
suggestion too, so done that way.

There's still one thing I'm a bit confused about here. I noticed that
we weren't actually using the partopfamily and partopcintype fields in
PartitionPruneContext, so I removed them. But that still leaves both
partsupfunc and partcollation as pointers into the relcache that were
subject to this hazard. My testing agrees with lousyjack's results
that both of those were, in fact, being improperly accessed. The OID
comparison effect I mentioned upthread explains why the buildfarm's
cache-clobbering members failed to notice any problem with garbage
partsupfunc data ... but why did we not see any complaints about invalid
collation OIDs? Tis strange.

regards, tom lane

#54Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#52)
1 attachment(s)
Re: why partition pruning doesn't work?

On 2018/06/13 23:39, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Seems reasonable. Really, I think we should look for a way to hang
onto the relation at the point where it's originally opened and locked
instead of reopening it here. But that's probably more invasive than
we can really justify right at the moment, and I think this is a step
in a good direction.

The existing coding there makes me itch a bit, because there's only a
rather fragile line of reasoning justifying the assumption that there is a
pre-existing lock at all. So I'd be in favor of what you suggest just to
get rid of the "open(NoLock)" hazard. But I agree that it'd be rather
invasive and right now is probably not the time for it.

I had sent a patch to try to get rid of the open(NoLock) there a couple of
months ago [1]/messages/by-id/0b361a22-f995-e15c-a385-6d1b72dd0d13@lab.ntt.co.jp. The idea was to both lock and open the relation in
ExecNonLeafAppendTables, which is the first time all partitioned tables in
a given Append node are locked for execution. Also, the patch makes it a
responsibility of ExecEndAppend to release the relcache pins, so the
recently added ExecDestroyPartitionPruneState would not be needed.

Attached is a rebased version of that patch if there is interest in it.

Thanks,
Amit

[1]: /messages/by-id/0b361a22-f995-e15c-a385-6d1b72dd0d13@lab.ntt.co.jp
/messages/by-id/0b361a22-f995-e15c-a385-6d1b72dd0d13@lab.ntt.co.jp

Attachments:

open-partitioned-rels-in-ExecNonLeafAppendTables-1.patchtext/plain; charset=UTF-8; name=open-partitioned-rels-in-ExecNonLeafAppendTables-1.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..ea6b05934b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1401,7 +1401,8 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * in each PartitionPruneInfo.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+							  Relation *partitioned_rels)
 {
 	PartitionPruneState *prunestate;
 	PartitionPruningData *prunedata;
@@ -1440,6 +1441,7 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
 		PartitionPruningData *pprune = &prunedata[i];
 		PartitionPruneContext *context = &pprune->context;
+		Relation	rel;
 		PartitionDesc partdesc;
 		PartitionKey partkey;
 		int			partnatts;
@@ -1460,17 +1462,11 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		/* present_parts is also subject to later modification */
 		pprune->present_parts = bms_copy(pinfo->present_parts);
 
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
+		Assert(partitioned_rels[i] != NULL);
+		rel = partitioned_rels[i];
+		Assert(RelationGetRelid(rel) == pinfo->reloid);
+		partkey = RelationGetPartitionKey(rel);
+		partdesc = RelationGetPartitionDesc(rel);
 		n_steps = list_length(pinfo->pruning_steps);
 
 		context->strategy = partkey->strategy;
@@ -1546,26 +1542,6 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 }
 
 /*
- * ExecDestroyPartitionPruneState
- *		Release resources at plan shutdown.
- *
- * We don't bother to free any memory here, since the whole executor context
- * will be going away shortly.  We do need to release our relcache pins.
- */
-void
-ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
-{
-	int			i;
-
-	for (i = 0; i < prunestate->num_partprunedata; i++)
-	{
-		PartitionPruningData *pprune = &prunestate->partprunedata[i];
-
-		relation_close(pprune->context.partrel, NoLock);
-	}
-}
-
-/*
  * ExecFindInitialMatchingSubPlans
  *		Identify the set of subplans that cannot be eliminated by initial
  *		pruning (disregarding any pruning constraints involving PARAM_EXEC
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..87809054ed 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -858,22 +858,52 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 /*
  * ExecLockNonLeafAppendTables
  *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list.  These are the non-leaf tables in
+ * the partition tree controlled by a given Append or MergeAppend node.
  */
 void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
+ExecLockNonLeafAppendTables(PlanState *planstate,
+							EState *estate,
+							List *partitioned_rels)
 {
 	PlannedStmt *stmt = estate->es_plannedstmt;
 	ListCell   *lc;
+	int			i;
 
+	if (partitioned_rels == NIL)
+		return;
+
+	/*
+	 * If we're going to be performing pruning, allocate space for Relation
+	 * pointers to be used later when setting up partition pruning state in
+	 * ExecCreatePartitionPruneState.
+	 */
+	if (IsA(planstate, AppendState))
+	{
+		AppendState *appendstate = (AppendState *) planstate;
+		Append *node = (Append *) planstate->plan;
+
+		if (node->part_prune_infos != NIL)
+		{
+			Assert(list_length(node->part_prune_infos) ==
+				   list_length(partitioned_rels));
+			appendstate->partitioned_rels = (Relation *)
+								palloc(sizeof(Relation) *
+									   list_length(node->part_prune_infos));
+			appendstate->num_partitioned_rels =
+									   list_length(node->part_prune_infos);
+		}
+	}
+
+	i = 0;
 	foreach(lc, partitioned_rels)
 	{
 		ListCell   *l;
 		Index		rti = lfirst_int(lc);
 		bool		is_result_rel = false;
 		Oid			relid = getrelid(rti, estate->es_range_table);
+		int			lockmode;
 
 		/* If this is a result relation, already locked in InitPlan */
 		foreach(l, stmt->nonleafResultRelations)
@@ -903,9 +933,39 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
 			}
 
 			if (rc && RowMarkRequiresRowShareLock(rc->markType))
-				LockRelationOid(relid, RowShareLock);
+				lockmode = RowShareLock;
 			else
-				LockRelationOid(relid, AccessShareLock);
+				lockmode = AccessShareLock;
+			switch (nodeTag(planstate))
+			{
+				/*
+				 * We need to also hold a pin on the partitioned table's
+				 * relcache entry so that we can rely on its copies of the
+				 * table's partition key and partition descriptor, which
+				 * are used when setting up partition pruning state.
+				 */
+				case T_AppendState:
+					{
+						AppendState *appendstate = (AppendState *) planstate;
+
+						if (appendstate->partitioned_rels)
+							appendstate->partitioned_rels[i] =
+												heap_open(relid, lockmode);
+						else
+							LockRelationOid(relid, lockmode);
+						i++;
+					}
+
+				/* Just lock here; there is no pruning support. */
+				case T_MergeAppendState:
+					LockRelationOid(relid, lockmode);
+					break;
+
+				default:
+					elog(ERROR, "invalid PlanState node: %d",
+						 nodeTag(planstate));
+					break;
+			}
 		}
 	}
 }
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..5ce4601a35 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -113,18 +113,19 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	Assert(!(eflags & EXEC_FLAG_MARK));
 
 	/*
-	 * Lock the non-leaf tables in the partition tree controlled by this node.
-	 * It's a no-op for non-partitioned parent tables.
-	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
-
-	/*
 	 * create new AppendState for our append node
 	 */
 	appendstate->ps.plan = (Plan *) node;
 	appendstate->ps.state = estate;
 	appendstate->ps.ExecProcNode = ExecAppend;
 
+	/*
+	 * Lock the non-leaf tables in the partition tree controlled by this node.
+	 * It's a no-op for non-partitioned parent tables.
+	 */
+	ExecLockNonLeafAppendTables((PlanState *) appendstate, estate,
+								node->partitioned_rels);
+
 	/* Let choose_next_subplan_* function handle setting the first subplan */
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
@@ -137,8 +138,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 		ExecAssignExprContext(estate, &appendstate->ps);
 
 		/* Create the working data structure for pruning. */
+
+		/* ExecLockNonLeafAppendTables must have set this up. */
+		Assert(appendstate->partitioned_rels != NULL);
 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_infos,
+												   appendstate->partitioned_rels);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
@@ -318,6 +323,7 @@ ExecEndAppend(AppendState *node)
 	PlanState **appendplans;
 	int			nplans;
 	int			i;
+	int			num_partitioned_rels;
 
 	/*
 	 * get information from the node
@@ -326,16 +332,19 @@ ExecEndAppend(AppendState *node)
 	nplans = node->as_nplans;
 
 	/*
+	 * Close partitioned rels that we may have opened for partition
+	 * pruning.
+	 */
+	num_partitioned_rels = node->num_partitioned_rels;
+	Assert(node->partitioned_rels != NULL || num_partitioned_rels == 0);
+	for (i = 0; i < num_partitioned_rels; i++)
+		heap_close(node->partitioned_rels[i], NoLock);
+
+	/*
 	 * shut down each of the subscans
 	 */
 	for (i = 0; i < nplans; i++)
 		ExecEndNode(appendplans[i]);
-
-	/*
-	 * release any resources associated with run-time pruning
-	 */
-	if (node->as_prune_state)
-		ExecDestroyPartitionPruneState(node->as_prune_state);
 }
 
 void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 118f4ef07d..6f28002ce2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -76,7 +76,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	 * Lock the non-leaf tables in the partition tree controlled by this node.
 	 * It's a no-op for non-partitioned parent tables.
 	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+	ExecLockNonLeafAppendTables((PlanState *) mergestate, estate,
+								node->partitioned_rels);
 
 	/*
 	 * Set up empty vector of subplan states
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..33c3f73e78 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -209,8 +209,8 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
-extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
+							  List *partitionpruneinfo,
+							  Relation *partitioned_rels);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 								int nsubplans);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f82b51667f..2b5eec9896 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -524,7 +524,9 @@ extern void UnregisterExprContextCallback(ExprContext *econtext,
 							  ExprContextCallbackFunction function,
 							  Datum arg);
 
-extern void ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate);
+extern void ExecLockNonLeafAppendTables(PlanState *planstate,
+							EState *estate,
+							List *partitioned_rels);
 
 extern Datum GetAttributeByName(HeapTupleHeader tuple, const char *attname,
 				   bool *isNull);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52cab0..a20d94fd9f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1092,6 +1092,8 @@ struct AppendState
 										 * the first partial plan */
 	ParallelAppendState *as_pstate; /* parallel coordination info */
 	Size		pstate_len;		/* size of parallel coordination info */
+	Relation   *partitioned_rels;
+	int			num_partitioned_rels;	/* number of entries in above array */
 	struct PartitionPruneState *as_prune_state;
 	Bitmapset  *as_valid_subplans;
 	bool		(*choose_next_subplan) (AppendState *);
#55David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#53)
Re: why partition pruning doesn't work?

On 14 June 2018 at 04:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still one thing I'm a bit confused about here. I noticed that
we weren't actually using the partopfamily and partopcintype fields in
PartitionPruneContext, so I removed them. But that still leaves both
partsupfunc and partcollation as pointers into the relcache that were
subject to this hazard. My testing agrees with lousyjack's results
that both of those were, in fact, being improperly accessed. The OID
comparison effect I mentioned upthread explains why the buildfarm's
cache-clobbering members failed to notice any problem with garbage
partsupfunc data ... but why did we not see any complaints about invalid
collation OIDs? Tis strange.

FWIW It's not working for me before e23bae82cf3 with
CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS and RELCACHE_FORCE_RELEASE,
and:

create table listp (a text) partition by list(a);
create table listp1 partition of listp for values in ('1');
select * from listp where a = (select '1');

I get:

ERROR: cache lookup failed for collation 2139062143

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#56David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#54)
Re: why partition pruning doesn't work?

On 14 June 2018 at 19:17, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I had sent a patch to try to get rid of the open(NoLock) there a couple of
months ago [1]. The idea was to both lock and open the relation in
ExecNonLeafAppendTables, which is the first time all partitioned tables in
a given Append node are locked for execution. Also, the patch makes it a
responsibility of ExecEndAppend to release the relcache pins, so the
recently added ExecDestroyPartitionPruneState would not be needed.

Robert and I briefly discussed something more complete a while back.
Not much detail was talked about, but the basic idea was to store the
Relation somewhere in the planner an executor that we could lookup by
rel index rather than having to relation_open all the time.

I just had a very quick look around and thought maybe RangeTblEntry
might be a good spot to store the Relation and current lock level that
we hold on that relation. This means that we can use
PlannerInfo->simple_rte_array in the planner and
EState->es_range_table in the executor. The latter of those would be
much nicer if we built an array instead of keeping the list (can
probably build that during InitPlan()). We could then get hold of a
Relation object when needed without having to do relation_open(...,
NoLock).

Code which currently looks like:

reloid = getrelid(scanrelid, estate->es_range_table);
rel = heap_open(reloid, lockmode);

In places like ExecOpenScanRelation, could be replaced with some
wrapper function call like: rte_open_rel(estate, scanrelid, lockmode);

This could also be used to ERROR out if rte_open_rel() was done
initially with NoLock. Right now there are so many places that we do
relation_open(..., NoLock) and write a comment /* Lock already taken
by ... */, which we hope is always true.

If the Relation is already populated in the RangeTblEntry then the
function would just return that, otherwise, we'd just look inside the
RangeTblEntry for the relation Oid and do a heap_open on it, and store
the lockmode that's held. We could then also consider getting of a
bunch of fields like boundinfo and nparts from RelOptInfo.

We could also perhaps do a WARNING about lock upgrade hazards in
there, at least maybe in debug builds.

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#57Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#56)
Re: why partition pruning doesn't work?

On Thu, Jun 14, 2018 at 7:23 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

It kind of flies in the face of the idea that a RangeTblEntry is just
a node that can be freely copied around, serialized and deserialized,
etc.

I think it would be better to keep the pointer in the RelOptInfo in
the planner and in the EState or PlanState in the executor. Those are
things we don't think can be copied, serialized, etc.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#57)
Re: why partition pruning doesn't work?

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 14, 2018 at 7:23 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

It kind of flies in the face of the idea that a RangeTblEntry is just
a node that can be freely copied around, serialized and deserialized,
etc.

And also the idea that the Plan tree is read-only to the executor,
which is not a good property to give up.

I think it would be better to keep the pointer in the RelOptInfo in
the planner and in the EState or PlanState in the executor. Those are
things we don't think can be copied, serialized, etc.

Yeah, RelOptInfo seems like the natural place in the planner; we might
need index relcache links in IndexOptInfo, too.

I'm less sure what to do in the executor. We already do keep open
relation pointers in PlanStates; the problem is just that it's
node-type-specific (ss_currentRelation, iss_RelationDesc, etc). Perhaps
that's unavoidable and we should just add more such fields as needed.

regards, tom lane

#59Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#58)
1 attachment(s)
Re: why partition pruning doesn't work?

On 2018/06/14 23:40, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 14, 2018 at 7:23 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

It kind of flies in the face of the idea that a RangeTblEntry is just
a node that can be freely copied around, serialized and deserialized,
etc.

And also the idea that the Plan tree is read-only to the executor,
which is not a good property to give up.

I think it would be better to keep the pointer in the RelOptInfo in
the planner and in the EState or PlanState in the executor. Those are
things we don't think can be copied, serialized, etc.

Yeah, RelOptInfo seems like the natural place in the planner; we might
need index relcache links in IndexOptInfo, too.

I'm less sure what to do in the executor. We already do keep open
relation pointers in PlanStates; the problem is just that it's
node-type-specific (ss_currentRelation, iss_RelationDesc, etc). Perhaps
that's unavoidable and we should just add more such fields as needed.

The patch I mentioned upthread maintains an array of Relation pointers in
AppendState with as many members as there are in the partitioned_rels list
that appears in the corresponding Append plan.

I revised that patch a bit to rename ExecLockNonLeafAppendTables to
ExecOpenAppendPartitionedTables to sound consistent with
ExecOpenScanRelation et al.

Thanks,
Amit

Attachments:

0001-Open-partitioned-tables-during-Append-initialization.patchtext/plain; charset=UTF-8; name=0001-Open-partitioned-tables-during-Append-initialization.patchDownload
From d55c30f8330081bc7c2eadea61d236a3b5de0f87 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 14 Jun 2018 14:50:22 +0900
Subject: [PATCH] Open partitioned tables during Append initialization

---
 src/backend/executor/execPartition.c   |  40 ++------
 src/backend/executor/execUtils.c       | 173 ++++++++++++++++++++++-----------
 src/backend/executor/nodeAppend.c      |  33 ++++---
 src/backend/executor/nodeMergeAppend.c |   9 +-
 src/include/executor/execPartition.h   |   4 +-
 src/include/executor/executor.h        |   4 +-
 src/include/nodes/execnodes.h          |   2 +
 7 files changed, 157 insertions(+), 108 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..ea6b05934b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1401,7 +1401,8 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * in each PartitionPruneInfo.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+							  Relation *partitioned_rels)
 {
 	PartitionPruneState *prunestate;
 	PartitionPruningData *prunedata;
@@ -1440,6 +1441,7 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
 		PartitionPruningData *pprune = &prunedata[i];
 		PartitionPruneContext *context = &pprune->context;
+		Relation	rel;
 		PartitionDesc partdesc;
 		PartitionKey partkey;
 		int			partnatts;
@@ -1460,17 +1462,11 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		/* present_parts is also subject to later modification */
 		pprune->present_parts = bms_copy(pinfo->present_parts);
 
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
+		Assert(partitioned_rels[i] != NULL);
+		rel = partitioned_rels[i];
+		Assert(RelationGetRelid(rel) == pinfo->reloid);
+		partkey = RelationGetPartitionKey(rel);
+		partdesc = RelationGetPartitionDesc(rel);
 		n_steps = list_length(pinfo->pruning_steps);
 
 		context->strategy = partkey->strategy;
@@ -1546,26 +1542,6 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 }
 
 /*
- * ExecDestroyPartitionPruneState
- *		Release resources at plan shutdown.
- *
- * We don't bother to free any memory here, since the whole executor context
- * will be going away shortly.  We do need to release our relcache pins.
- */
-void
-ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
-{
-	int			i;
-
-	for (i = 0; i < prunestate->num_partprunedata; i++)
-	{
-		PartitionPruningData *pprune = &prunestate->partprunedata[i];
-
-		relation_close(pprune->context.partrel, NoLock);
-	}
-}
-
-/*
  * ExecFindInitialMatchingSubPlans
  *		Identify the set of subplans that cannot be eliminated by initial
  *		pruning (disregarding any pruning constraints involving PARAM_EXEC
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..540b885c55 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -704,6 +704,124 @@ ExecCloseScanRelation(Relation scanrel)
 }
 
 /*
+ * ExecOpenAppendPartitionedTables
+ *
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list of a given Append or MergeAppend
+ * node.  These are the non-leaf tables in the partition tree controlled by
+ * the node.
+ */
+void
+ExecOpenAppendPartitionedTables(PlanState *planstate,
+								EState *estate,
+								List *partitioned_rels)
+{
+	PlannedStmt *stmt = estate->es_plannedstmt;
+	ListCell   *lc;
+	int			i;
+
+	Assert(partitioned_rels != NIL);
+
+	/*
+	 * If we're going to be performing pruning, allocate space for Relation
+	 * pointers to be used later when setting up partition pruning state in
+	 * ExecCreatePartitionPruneState.
+	 */
+	if (IsA(planstate, AppendState))
+	{
+		AppendState *appendstate = (AppendState *) planstate;
+		Append *node = (Append *) planstate->plan;
+
+		if (node->part_prune_infos != NIL)
+		{
+			Assert(list_length(node->part_prune_infos) ==
+				   list_length(partitioned_rels));
+			appendstate->partitioned_rels = (Relation *)
+								palloc(sizeof(Relation) *
+									   list_length(node->part_prune_infos));
+			appendstate->num_partitioned_rels =
+									   list_length(node->part_prune_infos);
+		}
+	}
+
+	i = 0;
+	foreach(lc, partitioned_rels)
+	{
+		ListCell   *l;
+		Index		rti = lfirst_int(lc);
+		bool		is_result_rel = false;
+		Oid			relid = getrelid(rti, estate->es_range_table);
+		int			lockmode;
+
+		/*
+		 * We need not bother about partitioned result relations here, they're
+		 * taken care of in InitPlan.
+		 */
+		foreach(l, stmt->nonleafResultRelations)
+		{
+			if (rti == lfirst_int(l))
+			{
+				is_result_rel = true;
+				break;
+			}
+		}
+
+		/*
+		 * Not a result relation; check if there is a RowMark that requires
+		 * taking a RowShareLock on this rel.
+		 */
+		if (!is_result_rel)
+		{
+			PlanRowMark *rc = NULL;
+
+			foreach(l, stmt->rowMarks)
+			{
+				if (((PlanRowMark *) lfirst(l))->rti == rti)
+				{
+					rc = lfirst(l);
+					break;
+				}
+			}
+
+			if (rc && RowMarkRequiresRowShareLock(rc->markType))
+				lockmode = RowShareLock;
+			else
+				lockmode = AccessShareLock;
+			switch (nodeTag(planstate))
+			{
+				/*
+				 * We need to also hold a pin on the partitioned table's
+				 * relcache entry so that we can rely on its copies of the
+				 * table's partition key and partition descriptor, which
+				 * are used when setting up partition pruning state.
+				 */
+				case T_AppendState:
+					{
+						AppendState *appendstate = (AppendState *) planstate;
+
+						if (appendstate->partitioned_rels)
+							appendstate->partitioned_rels[i] =
+												heap_open(relid, lockmode);
+						else
+							LockRelationOid(relid, lockmode);
+						i++;
+					}
+
+				/* Just lock here; there is no pruning support. */
+				case T_MergeAppendState:
+					LockRelationOid(relid, lockmode);
+					break;
+
+				default:
+					elog(ERROR, "invalid PlanState node: %d",
+						 nodeTag(planstate));
+					break;
+			}
+		}
+	}
+}
+
+/*
  * UpdateChangedParamSet
  *		Add changed parameters to a plan node's chgParam set
  */
@@ -856,61 +974,6 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 }
 
 /*
- * ExecLockNonLeafAppendTables
- *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
- */
-void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
-{
-	PlannedStmt *stmt = estate->es_plannedstmt;
-	ListCell   *lc;
-
-	foreach(lc, partitioned_rels)
-	{
-		ListCell   *l;
-		Index		rti = lfirst_int(lc);
-		bool		is_result_rel = false;
-		Oid			relid = getrelid(rti, estate->es_range_table);
-
-		/* If this is a result relation, already locked in InitPlan */
-		foreach(l, stmt->nonleafResultRelations)
-		{
-			if (rti == lfirst_int(l))
-			{
-				is_result_rel = true;
-				break;
-			}
-		}
-
-		/*
-		 * Not a result relation; check if there is a RowMark that requires
-		 * taking a RowShareLock on this rel.
-		 */
-		if (!is_result_rel)
-		{
-			PlanRowMark *rc = NULL;
-
-			foreach(l, stmt->rowMarks)
-			{
-				if (((PlanRowMark *) lfirst(l))->rti == rti)
-				{
-					rc = lfirst(l);
-					break;
-				}
-			}
-
-			if (rc && RowMarkRequiresRowShareLock(rc->markType))
-				LockRelationOid(relid, RowShareLock);
-			else
-				LockRelationOid(relid, AccessShareLock);
-		}
-	}
-}
-
-/*
  *		GetAttributeByName
  *		GetAttributeByNum
  *
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..24604aa311 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -113,18 +113,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	Assert(!(eflags & EXEC_FLAG_MARK));
 
 	/*
-	 * Lock the non-leaf tables in the partition tree controlled by this node.
-	 * It's a no-op for non-partitioned parent tables.
-	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
-
-	/*
 	 * create new AppendState for our append node
 	 */
 	appendstate->ps.plan = (Plan *) node;
 	appendstate->ps.state = estate;
 	appendstate->ps.ExecProcNode = ExecAppend;
 
+	/* Lock and open (if needed) the partitioned tables. */
+	if (node->partitioned_rels != NIL)
+		ExecOpenAppendPartitionedTables((PlanState *) appendstate, estate,
+										node->partitioned_rels);
+
 	/* Let choose_next_subplan_* function handle setting the first subplan */
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
@@ -137,8 +136,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 		ExecAssignExprContext(estate, &appendstate->ps);
 
 		/* Create the working data structure for pruning. */
+
+		/* ExecLockNonLeafAppendTables must have set this up. */
+		Assert(appendstate->partitioned_rels != NULL);
 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_infos,
+												   appendstate->partitioned_rels);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
@@ -318,6 +321,7 @@ ExecEndAppend(AppendState *node)
 	PlanState **appendplans;
 	int			nplans;
 	int			i;
+	int			num_partitioned_rels;
 
 	/*
 	 * get information from the node
@@ -326,16 +330,19 @@ ExecEndAppend(AppendState *node)
 	nplans = node->as_nplans;
 
 	/*
+	 * Close partitioned rels that we may have opened for partition
+	 * pruning.
+	 */
+	num_partitioned_rels = node->num_partitioned_rels;
+	Assert(node->partitioned_rels != NULL || num_partitioned_rels == 0);
+	for (i = 0; i < num_partitioned_rels; i++)
+		heap_close(node->partitioned_rels[i], NoLock);
+
+	/*
 	 * shut down each of the subscans
 	 */
 	for (i = 0; i < nplans; i++)
 		ExecEndNode(appendplans[i]);
-
-	/*
-	 * release any resources associated with run-time pruning
-	 */
-	if (node->as_prune_state)
-		ExecDestroyPartitionPruneState(node->as_prune_state);
 }
 
 void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 118f4ef07d..60ce18ea95 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -72,11 +72,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
 
-	/*
-	 * Lock the non-leaf tables in the partition tree controlled by this node.
-	 * It's a no-op for non-partitioned parent tables.
-	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+	/* Lock and open (if needed) the partitioned tables. */
+	if (node->partitioned_rels != NIL)
+		ExecOpenAppendPartitionedTables((PlanState *) mergestate, estate,
+										node->partitioned_rels);
 
 	/*
 	 * Set up empty vector of subplan states
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..33c3f73e78 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -209,8 +209,8 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
-extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
+							  List *partitionpruneinfo,
+							  Relation *partitioned_rels);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 								int nsubplans);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f82b51667f..eb6bc9d14f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -524,7 +524,9 @@ extern void UnregisterExprContextCallback(ExprContext *econtext,
 							  ExprContextCallbackFunction function,
 							  Datum arg);
 
-extern void ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate);
+extern void ExecOpenAppendPartitionedTables(PlanState *planstate,
+								EState *estate,
+								List *partitioned_rels);
 
 extern Datum GetAttributeByName(HeapTupleHeader tuple, const char *attname,
 				   bool *isNull);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52cab0..a20d94fd9f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1092,6 +1092,8 @@ struct AppendState
 										 * the first partial plan */
 	ParallelAppendState *as_pstate; /* parallel coordination info */
 	Size		pstate_len;		/* size of parallel coordination info */
+	Relation   *partitioned_rels;
+	int			num_partitioned_rels;	/* number of entries in above array */
 	struct PartitionPruneState *as_prune_state;
 	Bitmapset  *as_valid_subplans;
 	bool		(*choose_next_subplan) (AppendState *);
-- 
2.11.0

#60Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#56)
Re: why partition pruning doesn't work?

On 2018/06/14 20:23, David Rowley wrote:

On 14 June 2018 at 19:17, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I had sent a patch to try to get rid of the open(NoLock) there a couple of
months ago [1]. The idea was to both lock and open the relation in
ExecNonLeafAppendTables, which is the first time all partitioned tables in
a given Append node are locked for execution. Also, the patch makes it a
responsibility of ExecEndAppend to release the relcache pins, so the
recently added ExecDestroyPartitionPruneState would not be needed.

Robert and I briefly discussed something more complete a while back.
Not much detail was talked about, but the basic idea was to store the
Relation somewhere in the planner an executor that we could lookup by
rel index rather than having to relation_open all the time.

I just had a very quick look around and thought maybe RangeTblEntry
might be a good spot to store the Relation and current lock level that
we hold on that relation. This means that we can use
PlannerInfo->simple_rte_array in the planner and
EState->es_range_table in the executor. The latter of those would be
much nicer if we built an array instead of keeping the list (can
probably build that during InitPlan()). We could then get hold of a
Relation object when needed without having to do relation_open(...,
NoLock).

Code which currently looks like:

reloid = getrelid(scanrelid, estate->es_range_table);
rel = heap_open(reloid, lockmode);

In places like ExecOpenScanRelation, could be replaced with some
wrapper function call like: rte_open_rel(estate, scanrelid, lockmode);

This could also be used to ERROR out if rte_open_rel() was done
initially with NoLock. Right now there are so many places that we do
relation_open(..., NoLock) and write a comment /* Lock already taken
by ... */, which we hope is always true.

If the Relation is already populated in the RangeTblEntry then the
function would just return that, otherwise, we'd just look inside the
RangeTblEntry for the relation Oid and do a heap_open on it, and store
the lockmode that's held. We could then also consider getting of a
bunch of fields like boundinfo and nparts from RelOptInfo.

We could also perhaps do a WARNING about lock upgrade hazards in
there, at least maybe in debug builds.

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

Will also need to consider caching of plans, that is of PlannedStmts,
which in turn contain RangeTblEntry nodes. The idea that we can open a
relation at the beginning or even before planning and keep using the
Relation pointer all the way to the end of execution of a query seems hard
to realize. As others pointed out, we need to think of planning and
execution as separate phases and will have to open/close relations separately.

Thanks,
Amit

#61Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Andrew Dunstan (#41)
Re: why partition pruning doesn't work?

On 06/12/2018 07:47 AM, Andrew Dunstan wrote:

On 06/11/2018 06:41 PM, Andrew Dunstan wrote:

On 06/11/2018 06:24 PM, Tom Lane wrote:

If we had any buildfarm critters running valgrind on
RELCACHE_FORCE_RELEASE or CLOBBER_CACHE_ALWAYS builds, they'd have
detected use of uninitialized memory here ... but I don't think we have
any.  (The combination of valgrind and CCA would probably be too
slow to
be practical :-(, though maybe somebody with a fast machine could do
the other thing.)

I don't have a fast machine, but I do have a slow machine already
running valgrind and not doing much else :-) Let's see how lousyjack
does with -DRELCACHE_FORCE_RELEASE

It added about 20% to the run time. That's tolerable, so I'll leave it
on.

OK, lousyjack is back online with this, new and improved. It currently
takes 7.5 hours for a run.  Should I also add -DCATCACHE_FORCE_RELEASE?

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#61)
Re: why partition pruning doesn't work?

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

OK, lousyjack is back online with this, new and improved. It currently
takes 7.5 hours for a run.  Should I also add -DCATCACHE_FORCE_RELEASE?

I did some experimentation yesterday with valgrind plus both
RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE. I didn't find
any new bugs, but adding CATCACHE_FORCE_RELEASE makes things quite
a lot slower :-(. Not sure it's worth it.

regards, tom lane

#63Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#62)
Re: why partition pruning doesn't work?

On 06/15/2018 10:02 AM, Tom Lane wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

OK, lousyjack is back online with this, new and improved. It currently
takes 7.5 hours for a run.  Should I also add -DCATCACHE_FORCE_RELEASE?

I did some experimentation yesterday with valgrind plus both
RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE. I didn't find
any new bugs, but adding CATCACHE_FORCE_RELEASE makes things quite
a lot slower :-(. Not sure it's worth it.

OK.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#59)
Re: why partition pruning doesn't work?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

[ 0001-Open-partitioned-tables-during-Append-initialization.patch ]

I took a look at this. While I'm in agreement with the general idea of
holding open the partitioned relations' relcache entries throughout the
query, I do not like anything about this patch:

* It seems to be outright broken for the case that any of the partitioned
relations are listed in nonleafResultRelations. If we're going to do it
like this, we have to open every one of the partrels regardless of that.
(I wonder whether we couldn't lose PlannedStmt.nonleafResultRelations
altogether, in favor of merging the related code in InitPlan with this.
That existing code is already a mighty ugly wart, and this patch makes
it worse by adding new, related warts elsewhere.)

* You've got *far* too much intimate knowledge of the possible callers
in ExecOpenAppendPartitionedTables.

Personally, what I would have this function do is return a List of
the opened Relation pointers, and add a matching function to run through
such a List and close the entries again, and make the callers responsible
for stashing the List pointer in an appropriate field in their planstate.
Or maybe what we should do is drop ExecLockNonLeafAppendTables/
ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

regards, tom lane

#65Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#64)
1 attachment(s)
Re: why partition pruning doesn't work?

On 2018/06/19 2:05, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

[ 0001-Open-partitioned-tables-during-Append-initialization.patch ]

I took a look at this. While I'm in agreement with the general idea of
holding open the partitioned relations' relcache entries throughout the
query, I do not like anything about this patch:

Thanks for taking a look at it and sorry about the delay in replying.

* It seems to be outright broken for the case that any of the partitioned
relations are listed in nonleafResultRelations. If we're going to do it
like this, we have to open every one of the partrels regardless of that.

Yes, that was indeed wrong.

(I wonder whether we couldn't lose PlannedStmt.nonleafResultRelations
altogether, in favor of merging the related code in InitPlan with this.

Hmm, PlannedStmt.nonleafResultRelations exists for the same reason as why
PlannedStmt.resultRelations does, that is,

/*
* initialize result relation stuff, and open/lock the result rels.
*
* We must do this before initializing the plan tree, else we might try to
* do a lock upgrade if a result rel is also a source rel.
*/

nonleafResultRelations contains members of partitioned_rels lists of all
ModifyTable nodes contained in a plan.

That existing code is already a mighty ugly wart, and this patch makes
it worse by adding new, related warts elsewhere.)

I just realized that there is a thinko in the following piece of code in
ExecLockNonLeafAppendTables

/* If this is a result relation, already locked in InitPlan */
foreach(l, stmt->nonleafResultRelations)
{
if (rti == lfirst_int(l))
{
is_result_rel = true;
break;
}
}

It should actually be:

/* If this is a result relation, already locked in InitPlan */
foreach(l, stmt->nonleafResultRelations)
{
Index nonleaf_rti = lfirst_int(l);
Oid nonleaf_relid = getrelid(nonleaf_rti,
estate->es_range_table);

if (relid == nonleaf_relid)
{
is_result_rel = true;
break;
}
}

RT indexes in, say, Append.partitioned_rels, are distinct from those in
PlannedStmt.nonleafResultRelations, so the existing test never succeeds,
as also evident from the coverage report:

https://coverage.postgresql.org/src/backend/executor/execUtils.c.gcov.html#864

I'm wondering if we couldn't just get rid of this code. If an input
partitioned tables is indeed also a result relation, then we would've
locked it in InitPlan with RowExclusiveLock and heap_opening it with a
weaker lock (RowShare/AccessShare) wouldn't hurt.

* You've got *far* too much intimate knowledge of the possible callers
in ExecOpenAppendPartitionedTables.

Personally, what I would have this function do is return a List of
the opened Relation pointers, and add a matching function to run through
such a List and close the entries again, and make the callers responsible
for stashing the List pointer in an appropriate field in their planstate.

OK, I rewrote it to work that way.

Or maybe what we should do is drop ExecLockNonLeafAppendTables/
ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
have to have all the partitioned tables (contained in partitioned_rels
fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
in a global list like rootResultRelations and nonleafResultRelations of
PlannedStmt.

Attached updated patch.

Thanks,
Amit

Attachments:

v2-0001-Fix-opening-closing-of-partitioned-tables-in-Appe.patchtext/plain; charset=UTF-8; name=v2-0001-Fix-opening-closing-of-partitioned-tables-in-Appe.patchDownload
From 0fd93b0b2108d4d12f483b96aab2c25c120173cb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 14 Jun 2018 14:50:22 +0900
Subject: [PATCH v2] Fix opening/closing of partitioned tables in Append plans

---
 src/backend/executor/execPartition.c   |  49 ++++----------
 src/backend/executor/execUtils.c       | 114 +++++++++++++++++----------------
 src/backend/executor/nodeAppend.c      |  26 ++++----
 src/backend/executor/nodeMergeAppend.c |  11 ++--
 src/include/executor/execPartition.h   |   4 +-
 src/include/executor/executor.h        |   4 +-
 src/include/nodes/execnodes.h          |   4 ++
 7 files changed, 100 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..b9bc157bfa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1399,13 +1399,19 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  * PartitionPruningData for each item in that List.  This data can be re-used
  * each time we re-evaluate which partitions match the pruning steps provided
  * in each PartitionPruneInfo.
+ *
+ * 'partitioned_rels' is a List of same number of elements as there are in
+ * 'partitionpruneinfo' containing the Relation pointers of corresponding
+ * partitioned tables.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+							  List *partitioned_rels)
 {
 	PartitionPruneState *prunestate;
 	PartitionPruningData *prunedata;
-	ListCell   *lc;
+	ListCell   *lc1,
+			   *lc2;
 	int			i;
 
 	Assert(partitionpruneinfo != NIL);
@@ -1435,9 +1441,10 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 							  ALLOCSET_DEFAULT_SIZES);
 
 	i = 0;
-	foreach(lc, partitionpruneinfo)
+	forboth(lc1, partitionpruneinfo, lc2, partitioned_rels)
 	{
-		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
+		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc1));
+		Relation	rel = lfirst(lc2);
 		PartitionPruningData *pprune = &prunedata[i];
 		PartitionPruneContext *context = &pprune->context;
 		PartitionDesc partdesc;
@@ -1460,17 +1467,9 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 		/* present_parts is also subject to later modification */
 		pprune->present_parts = bms_copy(pinfo->present_parts);
 
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
+		Assert(RelationGetRelid(rel) == pinfo->reloid);
+		partkey = RelationGetPartitionKey(rel);
+		partdesc = RelationGetPartitionDesc(rel);
 		n_steps = list_length(pinfo->pruning_steps);
 
 		context->strategy = partkey->strategy;
@@ -1546,26 +1545,6 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 }
 
 /*
- * ExecDestroyPartitionPruneState
- *		Release resources at plan shutdown.
- *
- * We don't bother to free any memory here, since the whole executor context
- * will be going away shortly.  We do need to release our relcache pins.
- */
-void
-ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
-{
-	int			i;
-
-	for (i = 0; i < prunestate->num_partprunedata; i++)
-	{
-		PartitionPruningData *pprune = &prunestate->partprunedata[i];
-
-		relation_close(pprune->context.partrel, NoLock);
-	}
-}
-
-/*
  * ExecFindInitialMatchingSubPlans
  *		Identify the set of subplans that cannot be eliminated by initial
  *		pruning (disregarding any pruning constraints involving PARAM_EXEC
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..f936f8c3da 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -704,6 +704,65 @@ ExecCloseScanRelation(Relation scanrel)
 }
 
 /*
+ * ExecOpenAppendPartitionedTables
+ *
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list of a given Append or MergeAppend
+ * node.  These are the non-leaf tables in the partition tree controlled by
+ * the node.
+ */
+List *
+ExecOpenAppendPartitionedTables(List *partitioned_rels, EState *estate)
+{
+	PlannedStmt *stmt = estate->es_plannedstmt;
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	Assert(partitioned_rels != NIL);
+
+	foreach(lc, partitioned_rels)
+	{
+		ListCell   *l;
+		Index		rti = lfirst_int(lc);
+		Oid			relid = getrelid(rti, estate->es_range_table);
+		int			lockmode = AccessShareLock;
+		PlanRowMark *rc = NULL;
+
+		/* Check if there is a RowMark that requires taking a RowShareLock. */
+
+		foreach(l, stmt->rowMarks)
+		{
+			if (((PlanRowMark *) lfirst(l))->rti == rti)
+			{
+				rc = lfirst(l);
+				break;
+			}
+		}
+
+		if (rc && RowMarkRequiresRowShareLock(rc->markType))
+			lockmode = RowShareLock;
+
+		result = lappend(result, heap_open(relid, lockmode));
+	}
+
+	return result;
+}
+
+/* Close each Relation in the input list. */
+void
+ExecCloseAppendPartitionedTables(List *partitioned_rels)
+{
+	ListCell *lc;
+
+	/*
+	 * Close partitioned rels that we may have opened for partition
+	 * pruning.
+	 */
+	foreach(lc, partitioned_rels)
+		heap_close(lfirst(lc), NoLock);
+}
+
+/*
  * UpdateChangedParamSet
  *		Add changed parameters to a plan node's chgParam set
  */
@@ -856,61 +915,6 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 }
 
 /*
- * ExecLockNonLeafAppendTables
- *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
- */
-void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
-{
-	PlannedStmt *stmt = estate->es_plannedstmt;
-	ListCell   *lc;
-
-	foreach(lc, partitioned_rels)
-	{
-		ListCell   *l;
-		Index		rti = lfirst_int(lc);
-		bool		is_result_rel = false;
-		Oid			relid = getrelid(rti, estate->es_range_table);
-
-		/* If this is a result relation, already locked in InitPlan */
-		foreach(l, stmt->nonleafResultRelations)
-		{
-			if (rti == lfirst_int(l))
-			{
-				is_result_rel = true;
-				break;
-			}
-		}
-
-		/*
-		 * Not a result relation; check if there is a RowMark that requires
-		 * taking a RowShareLock on this rel.
-		 */
-		if (!is_result_rel)
-		{
-			PlanRowMark *rc = NULL;
-
-			foreach(l, stmt->rowMarks)
-			{
-				if (((PlanRowMark *) lfirst(l))->rti == rti)
-				{
-					rc = lfirst(l);
-					break;
-				}
-			}
-
-			if (rc && RowMarkRequiresRowShareLock(rc->markType))
-				LockRelationOid(relid, RowShareLock);
-			else
-				LockRelationOid(relid, AccessShareLock);
-		}
-	}
-}
-
-/*
  *		GetAttributeByName
  *		GetAttributeByNum
  *
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..77285af194 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -113,18 +113,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	Assert(!(eflags & EXEC_FLAG_MARK));
 
 	/*
-	 * Lock the non-leaf tables in the partition tree controlled by this node.
-	 * It's a no-op for non-partitioned parent tables.
-	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
-
-	/*
 	 * create new AppendState for our append node
 	 */
 	appendstate->ps.plan = (Plan *) node;
 	appendstate->ps.state = estate;
 	appendstate->ps.ExecProcNode = ExecAppend;
 
+	/* Lock and open (if needed) the partitioned tables. */
+	if (node->partitioned_rels != NIL)
+		appendstate->as_partitioned_rels =
+			ExecOpenAppendPartitionedTables(node->partitioned_rels, estate);
+
 	/* Let choose_next_subplan_* function handle setting the first subplan */
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
@@ -137,8 +136,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 		ExecAssignExprContext(estate, &appendstate->ps);
 
 		/* Create the working data structure for pruning. */
-		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+
+		prunestate =
+				ExecCreatePartitionPruneState(&appendstate->ps,
+											  node->part_prune_infos,
+											  appendstate->as_partitioned_rels);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
@@ -325,17 +327,13 @@ ExecEndAppend(AppendState *node)
 	appendplans = node->appendplans;
 	nplans = node->as_nplans;
 
+	ExecCloseAppendPartitionedTables(node->as_partitioned_rels);
+
 	/*
 	 * shut down each of the subscans
 	 */
 	for (i = 0; i < nplans; i++)
 		ExecEndNode(appendplans[i]);
-
-	/*
-	 * release any resources associated with run-time pruning
-	 */
-	if (node->as_prune_state)
-		ExecDestroyPartitionPruneState(node->as_prune_state);
 }
 
 void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 118f4ef07d..8de956b129 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -72,11 +72,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
 
-	/*
-	 * Lock the non-leaf tables in the partition tree controlled by this node.
-	 * It's a no-op for non-partitioned parent tables.
-	 */
-	ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+	/* Lock and open (if needed) the partitioned tables. */
+	if (node->partitioned_rels != NIL)
+		mergestate->ms_partitioned_rels =
+			ExecOpenAppendPartitionedTables(node->partitioned_rels, estate);
 
 	/*
 	 * Set up empty vector of subplan states
@@ -283,6 +282,8 @@ ExecEndMergeAppend(MergeAppendState *node)
 	mergeplans = node->mergeplans;
 	nplans = node->ms_nplans;
 
+	ExecCloseAppendPartitionedTables(node->ms_partitioned_rels);
+
 	/*
 	 * shut down each of the subscans
 	 */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..7b0744a2c9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -209,8 +209,8 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
-extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
+							  List *partitionpruneinfo,
+							  List *partitioned_rels);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 								int nsubplans);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f82b51667f..301e4b4059 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -524,7 +524,9 @@ extern void UnregisterExprContextCallback(ExprContext *econtext,
 							  ExprContextCallbackFunction function,
 							  Datum arg);
 
-extern void ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate);
+extern List *ExecOpenAppendPartitionedTables(List *partitioned_rels,
+								EState *estate);
+extern void ExecCloseAppendPartitionedTables(List *partitioned_rels);
 
 extern Datum GetAttributeByName(HeapTupleHeader tuple, const char *attname,
 				   bool *isNull);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52cab0..55996b80d3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1073,6 +1073,7 @@ typedef struct ModifyTableState
  *							eliminated from the scan, or NULL if not possible.
  *		valid_subplans		for runtime pruning, valid appendplans indexes to
  *							scan.
+ *		partitioned_rels	Relation pointers of partitioned tables
  * ----------------
  */
 
@@ -1095,6 +1096,7 @@ struct AppendState
 	struct PartitionPruneState *as_prune_state;
 	Bitmapset  *as_valid_subplans;
 	bool		(*choose_next_subplan) (AppendState *);
+	List	   *as_partitioned_rels;	/* List of Relation pointers */
 };
 
 /* ----------------
@@ -1106,6 +1108,7 @@ struct AppendState
  *		slots			current output tuple of each subplan
  *		heap			heap of active tuples
  *		initialized		true if we have fetched first tuple from each subplan
+ *		partitioned_rels	Relation pointers of partitioned tables
  * ----------------
  */
 typedef struct MergeAppendState
@@ -1118,6 +1121,7 @@ typedef struct MergeAppendState
 	TupleTableSlot **ms_slots;	/* array of length ms_nplans */
 	struct binaryheap *ms_heap; /* binary heap of slot indices */
 	bool		ms_initialized; /* are subplans started? */
+	List	   *ms_partitioned_rels;	/* List of Relation pointers */
 } MergeAppendState;
 
 /* ----------------
-- 
2.11.0

#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#65)
1 attachment(s)
Re: why partition pruning doesn't work?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/06/19 2:05, Tom Lane wrote:

Or maybe what we should do is drop ExecLockNonLeafAppendTables/
ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
have to have all the partitioned tables (contained in partitioned_rels
fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
in a global list like rootResultRelations and nonleafResultRelations of
PlannedStmt.

Attached updated patch.

I've been looking at this patch, and while it's not unreasonable on its
own terms, I'm growing increasingly distressed at the ad-hoc and rather
duplicative nature of the data structures that have gotten stuck into
plan trees thanks to partitioning (the rootResultRelations and
nonleafResultRelations lists being prime examples).

It struck me this morning that a whole lot of the complication here is
solely due to needing to identify the right type of relation lock to take
during executor startup, and that THAT WORK IS TOTALLY USELESS. In every
case, we must already hold a suitable lock before we ever get to the
executor; either one acquired during the parse/plan pipeline, or one
re-acquired by AcquireExecutorLocks in the case of a cached plan.
Otherwise it's entirely possible that the plan has been invalidated by
concurrent DDL --- and it's not the executor's job to detect that and
re-plan; that *must* have been done upstream.

Moreover, it's important from a deadlock-avoidance standpoint that these
locks get acquired in the same order as they were acquired during the
initial parse/plan pipeline. I think it's reasonable to assume they were
acquired in RTE order in that stage, so AcquireExecutorLocks is OK. But,
if the current logic in the executor gets them in that order, it's both
non-obvious that it does so and horribly fragile if it does, seeing that
the responsibility for this is split across InitPlan,
ExecOpenScanRelation, and ExecLockNonLeafAppendTables.

So I'm thinking that what we really ought to do here is simplify executor
startup to just open all rels with NoLock, and get rid of any supporting
data structures that turn out to have no other use. (David Rowley's
nearby patch to create a properly hierarchical executor data structure for
partitioning info is likely to tie into this too, by removing some other
vestigial uses of those lists.)

I think that this idea has been discussed in the past, and we felt at
the time that having the executor take its own locks was a good safety
measure, and a relatively cheap one since the lock manager is pretty
good at short-circuiting duplicative lock requests. But those are
certainly not free. Moreover, I'm not sure that this is really a
safety measure at all: if the executor were really taking any lock
not already held, it'd be masking a DDL hazard.

To investigate this further, I made the attached not-meant-for-commit
hack to verify whether InitPlan and related executor startup functions
were actually taking any not-previously-held locks. I could only find
one such case: the parser always opens tables selected FOR UPDATE with
RowShareLock, but if we end up implementing the resulting row mark
with ROW_MARK_COPY, the executor is acquiring just AccessShareLock
(because ExecOpenScanRelation thinks it needs to acquire some lock).
The patch as presented hacks ExecOpenScanRelation to avoid that, and
it passes check-world.

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel. That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

I'm also rather strongly tempted to add a field to RangeTblEntry
that records the appropriate lock strength to take, so that we don't
have to rely on keeping AcquireExecutorLocks' logic to decide on the
lock type in sync with whatever the parse/plan pipeline does. (One
could then imagine adding assertions in the executor that this field
shows a lock strength of at least X, in place of actually opening
the rel with X.)

BTW, there'd be a lot to be said for having InitPlan just open all
the rels and build an array of Relation pointers that parallels the
RTE list, rather than doing heap_opens in random places elsewhere.

regards, tom lane

Attachments:

verify-executor-locks-are-already-held.patchtext/x-diff; charset=us-ascii; name=verify-executor-locks-are-already-held.patchDownload
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8026fe2..58c62f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -817,6 +817,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	TupleDesc	tupType;
 	ListCell   *l;
 	int			i;
+	bool		gotlock;
 
 	/*
 	 * Do permissions checks
@@ -852,7 +853,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 			Relation	resultRelation;
 
 			resultRelationOid = getrelid(resultRelationIndex, rangeTable);
-			resultRelation = heap_open(resultRelationOid, RowExclusiveLock);
+			gotlock = LockRelationOid(resultRelationOid, RowExclusiveLock);
+			Assert(!gotlock || IsParallelWorker());
+			resultRelation = heap_open(resultRelationOid, NoLock);
 
 			InitResultRelInfo(resultRelInfo,
 							  resultRelation,
@@ -892,7 +895,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 				Relation	resultRelDesc;
 
 				resultRelOid = getrelid(resultRelIndex, rangeTable);
-				resultRelDesc = heap_open(resultRelOid, RowExclusiveLock);
+				gotlock = LockRelationOid(resultRelOid, RowExclusiveLock);
+				Assert(!gotlock || IsParallelWorker());
+				resultRelDesc = heap_open(resultRelOid, NoLock);
 				InitResultRelInfo(resultRelInfo,
 								  resultRelDesc,
 								  lfirst_int(l),
@@ -912,8 +917,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 				/* We locked the roots above. */
 				if (!list_member_int(plannedstmt->rootResultRelations,
 									 resultRelIndex))
-					LockRelationOid(getrelid(resultRelIndex, rangeTable),
-									RowExclusiveLock);
+				{
+					gotlock = LockRelationOid(getrelid(resultRelIndex, rangeTable),
+											  RowExclusiveLock);
+					Assert(!gotlock || IsParallelWorker());
+				}
 			}
 		}
 	}
@@ -963,10 +971,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 			case ROW_MARK_NOKEYEXCLUSIVE:
 			case ROW_MARK_SHARE:
 			case ROW_MARK_KEYSHARE:
-				relation = heap_open(relid, RowShareLock);
+				gotlock = LockRelationOid(relid, RowShareLock);
+				Assert(!gotlock || IsParallelWorker());
+				relation = heap_open(relid, NoLock);
 				break;
 			case ROW_MARK_REFERENCE:
-				relation = heap_open(relid, AccessShareLock);
+				gotlock = LockRelationOid(relid, AccessShareLock);
+				Assert(!gotlock || IsParallelWorker());
+				relation = heap_open(relid, NoLock);
 				break;
 			case ROW_MARK_COPY:
 				/* no physical table access is required */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae..cf08b50 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -42,6 +42,7 @@
 
 #include "postgres.h"
 
+#include "access/parallel.h"
 #include "access/relscan.h"
 #include "access/transam.h"
 #include "executor/executor.h"
@@ -645,6 +646,7 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
 	Relation	rel;
 	Oid			reloid;
 	LOCKMODE	lockmode;
+	bool		gotlock;
 
 	/*
 	 * Determine the lock type we need.  First, scan to see if target relation
@@ -659,13 +661,19 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
 		/* Keep this check in sync with InitPlan! */
 		ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
 
-		if (erm != NULL && erm->relation != NULL)
+		/* HACK: assume things are OK for ROW_MARK_COPY case */
+		if (erm != NULL)
 			lockmode = NoLock;
 	}
 
 	/* Open the relation and acquire lock as needed */
 	reloid = getrelid(scanrelid, estate->es_range_table);
-	rel = heap_open(reloid, lockmode);
+	if (lockmode != NoLock)
+	{
+		gotlock = LockRelationOid(reloid, lockmode);
+		Assert(!gotlock || IsParallelWorker());
+	}
+	rel = heap_open(reloid, NoLock);
 
 	/*
 	 * Complain if we're attempting a scan of an unscannable relation, except
@@ -874,6 +882,7 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
 		Index		rti = lfirst_int(lc);
 		bool		is_result_rel = false;
 		Oid			relid = getrelid(rti, estate->es_range_table);
+		bool		gotlock;
 
 		/* If this is a result relation, already locked in InitPlan */
 		foreach(l, stmt->nonleafResultRelations)
@@ -903,9 +912,10 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
 			}
 
 			if (rc && RowMarkRequiresRowShareLock(rc->markType))
-				LockRelationOid(relid, RowShareLock);
+				gotlock = LockRelationOid(relid, RowShareLock);
 			else
-				LockRelationOid(relid, AccessShareLock);
+				gotlock = LockRelationOid(relid, AccessShareLock);
+			Assert(!gotlock || IsParallelWorker());
 		}
 	}
 }
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 7b2dcb6..a1ebf64 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -100,8 +100,9 @@ SetLocktagRelationOid(LOCKTAG *tag, Oid relid)
  *
  * Lock a relation given only its OID.  This should generally be used
  * before attempting to open the relation's relcache entry.
+ * Return TRUE if we acquired a new lock, FALSE if already held.
  */
-void
+bool
 LockRelationOid(Oid relid, LOCKMODE lockmode)
 {
 	LOCKTAG		tag;
@@ -122,7 +123,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	 * CommandCounterIncrement, not here.)
 	 */
 	if (res != LOCKACQUIRE_ALREADY_HELD)
+	{
 		AcceptInvalidationMessages();
+		return true;
+	}
+	return false;
 }
 
 /*
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index a217de9..69e6f7f 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -37,7 +37,7 @@ typedef enum XLTW_Oper
 extern void RelationInitLockInfo(Relation relation);
 
 /* Lock a relation */
-extern void LockRelationOid(Oid relid, LOCKMODE lockmode);
+extern bool LockRelationOid(Oid relid, LOCKMODE lockmode);
 extern bool ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode);
 extern void UnlockRelationId(LockRelId *relid, LOCKMODE lockmode);
 extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode);
#67Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#66)
Re: why partition pruning doesn't work?

On Sun, Jul 15, 2018 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel. That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

+1. In fact, maybe we ought to go a little further and have a
relation_reopen(oid, mode) that verifies that a lock in the specified
mode is held.

And then maybe we ought to go even further and start trying to get rid
of all the places where we reopen already-opened relations. A
distressing number of new patches add more places that do that, and
while I try to push back on those, I think they are proliferating, and
I think that they are not free. Granted, a hash table lookup is
pretty cheap, but if you do a sufficient number of them in
commonlt-taken code paths, it's got to cost something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#68David Rowley
david.rowley@2ndquadrant.com
In reply to: Robert Haas (#67)
Re: why partition pruning doesn't work?

On 16 July 2018 at 12:12, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 15, 2018 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel. That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

+1. In fact, maybe we ought to go a little further and have a
relation_reopen(oid, mode) that verifies that a lock in the specified
mode is held.

Wouldn't it be better to just store the Relation indexed by its relid
somewhere the first time we opened it? Then just do a direct array
lookup on that rather than looking up by hashtable in syscache?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#68)
Re: why partition pruning doesn't work?

David Rowley <david.rowley@2ndquadrant.com> writes:

On 16 July 2018 at 12:12, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 15, 2018 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel.

Wouldn't it be better to just store the Relation indexed by its relid
somewhere the first time we opened it? Then just do a direct array
lookup on that rather than looking up by hashtable in syscache?

That would require carrying said array clear through from the parser to
the executor, plus we'd have some fun keeping it in sync with the RTE
changes that happen in the rewriter and planner, plus it's not entirely
clear where we'd close those relations in cases where the generated
plan isn't fed to the executor immediately (or gets copied in any way).
I don't think it's worth going that far.

I *do* think it might be worth behaving like that within the executor
by itself, and said so upthread. In that situation, we have a clear
place to do relation closes during ExecutorEnd, so it's not as messy
as a more general approach would be.

regards, tom lane

#70Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#68)
Re: why partition pruning doesn't work?

On Sun, Jul 15, 2018 at 8:24 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

+1. In fact, maybe we ought to go a little further and have a
relation_reopen(oid, mode) that verifies that a lock in the specified
mode is held.

Wouldn't it be better to just store the Relation indexed by its relid
somewhere the first time we opened it? Then just do a direct array
lookup on that rather than looking up by hashtable in syscache?

Yes, that would be better, IMHO anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#71Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#66)
Re: why partition pruning doesn't work?

On 2018/07/16 2:02, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/06/19 2:05, Tom Lane wrote:

Or maybe what we should do is drop ExecLockNonLeafAppendTables/
ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
have to have all the partitioned tables (contained in partitioned_rels
fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
in a global list like rootResultRelations and nonleafResultRelations of
PlannedStmt.

Attached updated patch.

I've been looking at this patch, and while it's not unreasonable on its
own terms, I'm growing increasingly distressed at the ad-hoc and rather
duplicative nature of the data structures that have gotten stuck into
plan trees thanks to partitioning (the rootResultRelations and
nonleafResultRelations lists being prime examples).

It struck me this morning that a whole lot of the complication here is
solely due to needing to identify the right type of relation lock to take
during executor startup, and that THAT WORK IS TOTALLY USELESS.

I agree. IIUC, that's also the reason why PlannedStmt had to grow a
resultRelations field in the first place.

In every
case, we must already hold a suitable lock before we ever get to the
executor; either one acquired during the parse/plan pipeline, or one
re-acquired by AcquireExecutorLocks in the case of a cached plan.
Otherwise it's entirely possible that the plan has been invalidated by
concurrent DDL --- and it's not the executor's job to detect that and
re-plan; that *must* have been done upstream.

Moreover, it's important from a deadlock-avoidance standpoint that these
locks get acquired in the same order as they were acquired during the
initial parse/plan pipeline. I think it's reasonable to assume they were
acquired in RTE order in that stage, so AcquireExecutorLocks is OK. But,
if the current logic in the executor gets them in that order, it's both
non-obvious that it does so and horribly fragile if it does, seeing that
the responsibility for this is split across InitPlan,
ExecOpenScanRelation, and ExecLockNonLeafAppendTables.

So I'm thinking that what we really ought to do here is simplify executor
startup to just open all rels with NoLock, and get rid of any supporting
data structures that turn out to have no other use. (David Rowley's
nearby patch to create a properly hierarchical executor data structure for
partitioning info is likely to tie into this too, by removing some other
vestigial uses of those lists.)

I agree it would be nice to be able to get rid of those lists.

I think that this idea has been discussed in the past, and we felt at
the time that having the executor take its own locks was a good safety
measure, and a relatively cheap one since the lock manager is pretty
good at short-circuiting duplicative lock requests. But those are
certainly not free. Moreover, I'm not sure that this is really a
safety measure at all: if the executor were really taking any lock
not already held, it'd be masking a DDL hazard.

To investigate this further, I made the attached not-meant-for-commit
hack to verify whether InitPlan and related executor startup functions
were actually taking any not-previously-held locks. I could only find
one such case: the parser always opens tables selected FOR UPDATE with
RowShareLock, but if we end up implementing the resulting row mark
with ROW_MARK_COPY, the executor is acquiring just AccessShareLock
(because ExecOpenScanRelation thinks it needs to acquire some lock).
The patch as presented hacks ExecOpenScanRelation to avoid that, and
it passes check-world.

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel. That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

+1

I'm also rather strongly tempted to add a field to RangeTblEntry
that records the appropriate lock strength to take, so that we don't
have to rely on keeping AcquireExecutorLocks' logic to decide on the
lock type in sync with whatever the parse/plan pipeline does. (One
could then imagine adding assertions in the executor that this field
shows a lock strength of at least X, in place of actually opening
the rel with X.)

BTW, there'd be a lot to be said for having InitPlan just open all
the rels and build an array of Relation pointers that parallels the
RTE list, rather than doing heap_opens in random places elsewhere.

+1 to this. Actually I had written such a patch in place of the one I
posted on this thread, but wasn't confident enough that I had found and
fixed all the places in the executor that would use the Relation pointer
from there instead of fetching it themselves.

Thanks,
Amit

#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#71)
Re: why partition pruning doesn't work?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/07/16 2:02, Tom Lane wrote:

BTW, there'd be a lot to be said for having InitPlan just open all
the rels and build an array of Relation pointers that parallels the
RTE list, rather than doing heap_opens in random places elsewhere.

+1 to this. Actually I had written such a patch in place of the one I
posted on this thread, but wasn't confident enough that I had found and
fixed all the places in the executor that would use the Relation pointer
from there instead of fetching it themselves.

Well, it doesn't have to be a 100% conversion, especially not to start
out with. As long as we ensure that common code paths through the
executor have only one open/close per rel, I don't think it matters
too much if there are some corner cases with more.

Also, tracking down places like that wouldn't be that hard to do. One
could instrument the relcache to report situations where a rel's refcount
gets above 1. (The infrastructure for logging backtraces that's being
discussed in other threads would help here.) We might decide that
particular cases weren't worth fixing --- eg if the extra open is
happening in code that doesn't have easy access to the EState --- but
I doubt that finding them is impractical.

regards, tom lane