Yet another issue with step generation in partition pruning

Started by Etsuro Fujitaover 5 years ago5 messages
#1Etsuro Fujita
etsuro.fujita@gmail.com
1 attachment(s)

Commit 13838740f fixed some issues with step generation in partition
pruning, but as I mentioned in [1]/messages/by-id/CAPmGK15=c8Q-Ac3ogzZp_d6VsfRYSL2tD8zLwy_WYdrMXQhiCQ@mail.gmail.com, I noticed that there is yet
another issue: get_steps_using_prefix() assumes that clauses in the
passed-in prefix list are sorted in ascending order of their partition
key numbers, but the caller (i.e., gen_prune_steps_from_opexps())
doesn’t ensure that in the case of range partitioning, leading to an
assertion failure. Here is an example causing such a failure, which
would happen with/without that commit:

create table rp_prefix_test2 (a int, b int, c int) partition by range (a, b, c);
create table rp_prefix_test2_p1 partition of rp_prefix_test2 for
values from (1, 1, 0) to (1, 1, 10);
create table rp_prefix_test2_p2 partition of rp_prefix_test2 for
values from (2, 2, 0) to (2, 2, 10);
select * from rp_prefix_test2 where a <= 1 and b <= 1 and b = 1 and c <= 0;

I don't think we write queries like this, but for this query, the
caller would create the prefix list for the last partition key “c”
{b=1, a<=1, b<=1} (the clauses are not sorted properly!), then calling
get_steps_using_prefix(), which leads to an assertion failure.
Attached is a patch for fixing this issue.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/CAPmGK15=c8Q-Ac3ogzZp_d6VsfRYSL2tD8zLwy_WYdrMXQhiCQ@mail.gmail.com

Attachments:

fix-pruning-step-generation.patchapplication/octet-stream; name=fix-pruning-step-generation.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 253c690649..6268623d56 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1362,7 +1362,6 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
 				List	   *eq_clauses = btree_clauses[BTEqualStrategyNumber];
 				List	   *le_clauses = btree_clauses[BTLessEqualStrategyNumber];
 				List	   *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber];
-				bool		pk_has_clauses[PARTITION_MAX_KEYS];
 				int			strat;
 
 				/*
@@ -1382,10 +1381,15 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
 					foreach(lc, btree_clauses[strat])
 					{
 						PartClauseInfo *pc = lfirst(lc);
+						ListCell   *eq_start;
+						ListCell   *le_start;
+						ListCell   *ge_start;
 						ListCell   *lc1;
 						List	   *prefix = NIL;
 						List	   *pc_steps;
 						bool		prefix_valid = true;
+						bool		pk_has_clauses;
+						int			keyno;
 
 						/*
 						 * If this is a clause for the first partition key,
@@ -1410,79 +1414,96 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
 							continue;
 						}
 
-						/* (Re-)initialize the pk_has_clauses array */
-						Assert(pc->keyno > 0);
-						for (i = 0; i < pc->keyno; i++)
-							pk_has_clauses[i] = false;
+						eq_start = list_head(eq_clauses);
+						le_start = list_head(le_clauses);
+						ge_start = list_head(ge_clauses);
 
 						/*
-						 * Expressions from = clauses can always be in the
-						 * prefix, provided they're from an earlier key.
+						 * We arrange clauses into prefix in ascending order
+						 * of their partition key numbers.
 						 */
-						foreach(lc1, eq_clauses)
+						for (keyno = 0; keyno < pc->keyno; keyno++)
 						{
-							PartClauseInfo *eqpc = lfirst(lc1);
+							pk_has_clauses = false;
 
-							if (eqpc->keyno == pc->keyno)
-								break;
-							if (eqpc->keyno < pc->keyno)
+							/*
+							 * Expressions from = clauses can always be in the
+							 * prefix, provided they're from an earlier key.
+							 */
+							for_each_cell(lc1, eq_clauses, eq_start)
 							{
-								prefix = lappend(prefix, eqpc);
-								pk_has_clauses[eqpc->keyno] = true;
-							}
-						}
+								PartClauseInfo *eqpc = lfirst(lc1);
 
-						/*
-						 * If we're generating steps for </<= strategy, we can
-						 * add other <= clauses to the prefix, provided
-						 * they're from an earlier key.
-						 */
-						if (strat == BTLessStrategyNumber ||
-							strat == BTLessEqualStrategyNumber)
-						{
-							foreach(lc1, le_clauses)
-							{
-								PartClauseInfo *lepc = lfirst(lc1);
-
-								if (lepc->keyno == pc->keyno)
+								if (eqpc->keyno == keyno)
+								{
+									prefix = lappend(prefix, eqpc);
+									pk_has_clauses = true;
+								}
+								else
+								{
+									Assert(eqpc->keyno > keyno);
 									break;
-								if (lepc->keyno < pc->keyno)
+								}
+							}
+							eq_start = lc1;
+
+							/*
+							 * If we're generating steps for </<= strategy, we
+							 * can add other <= clauses to the prefix,
+							 * provided they're from an earlier key.
+							 */
+							if (strat == BTLessStrategyNumber ||
+								strat == BTLessEqualStrategyNumber)
+							{
+								for_each_cell(lc1, le_clauses, le_start)
 								{
-									prefix = lappend(prefix, lepc);
-									pk_has_clauses[lepc->keyno] = true;
+									PartClauseInfo *lepc = lfirst(lc1);
+
+									if (lepc->keyno == keyno)
+									{
+										prefix = lappend(prefix, lepc);
+										pk_has_clauses = true;
+									}
+									else
+									{
+										Assert(lepc->keyno > keyno);
+										break;
+									}
 								}
+								le_start = lc1;
 							}
-						}
 
-						/*
-						 * If we're generating steps for >/>= strategy, we can
-						 * add other >= clauses to the prefix, provided
-						 * they're from an earlier key.
-						 */
-						if (strat == BTGreaterStrategyNumber ||
-							strat == BTGreaterEqualStrategyNumber)
-						{
-							foreach(lc1, ge_clauses)
+							/*
+							 * If we're generating steps for >/>= strategy, we
+							 * can add other >= clauses to the prefix,
+							 * provided they're from an earlier key.
+							 */
+							if (strat == BTGreaterStrategyNumber ||
+								strat == BTGreaterEqualStrategyNumber)
 							{
-								PartClauseInfo *gepc = lfirst(lc1);
-
-								if (gepc->keyno == pc->keyno)
-									break;
-								if (gepc->keyno < pc->keyno)
+								for_each_cell(lc1, ge_clauses, ge_start)
 								{
-									prefix = lappend(prefix, gepc);
-									pk_has_clauses[gepc->keyno] = true;
+									PartClauseInfo *gepc = lfirst(lc1);
+
+									if (gepc->keyno == keyno)
+									{
+										prefix = lappend(prefix, gepc);
+										pk_has_clauses = true;
+									}
+									else
+									{
+										Assert(gepc->keyno > keyno);
+										break;
+									}
 								}
+								ge_start = lc1;
 							}
-						}
 
-						/*
-						 * Check whether every earlier partition key has at
-						 * least one clause.
-						 */
-						for (i = 0; i < pc->keyno; i++)
-						{
-							if (!pk_has_clauses[i])
+							/*
+							 * If this key has no clauses, prefix is not valid
+							 * anymore.
+							 */
+							if (!pk_has_clauses)
 							{
 								prefix_valid = false;
 								break;
@@ -2241,6 +2262,9 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
  * non-NULL, but they must ensure that prefix contains at least one clause
  * for each of the partition keys other than those specified in step_nullkeys
  * and step_lastkeyno.
+ *
+ * For both cases, callers must also ensure that clauses in prefix are sorted
+ * in ascending order of their partition key numbers.
  */
 static List *
 get_steps_using_prefix(GeneratePruningStepsContext *context,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 687cf8c5f4..c9979d6283 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3711,6 +3711,13 @@ explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b
    Filter: ((a >= 1) AND (b >= 1) AND (b >= 2) AND (c >= 2) AND (d >= 0))
 (2 rows)
 
+explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b = 2 and c = 2 and d >= 0;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Seq Scan on rp_prefix_test3_p2 rp_prefix_test3
+   Filter: ((a >= 1) AND (b >= 1) AND (d >= 0) AND (b = 2) AND (c = 2))
+(2 rows)
+
 create table hp_prefix_test (a int, b int, c int, d int) partition by hash (a part_test_int4_ops, b part_test_int4_ops, c part_test_int4_ops, d part_test_int4_ops);
 create table hp_prefix_test_p1 partition of hp_prefix_test for values with (modulus 2, remainder 0);
 create table hp_prefix_test_p2 partition of hp_prefix_test for values with (modulus 2, remainder 1);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 93ef9dc1f3..5bc8fb4506 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1080,6 +1080,8 @@ create table rp_prefix_test3_p2 partition of rp_prefix_test3 for values from (2,
 -- clauses for the partition key b (ie, b >= 1 and b >= 2)
 explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b >= 2 and c >= 2 and d >= 0;
 
+explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b = 2 and c = 2 and d >= 0;
+
 create table hp_prefix_test (a int, b int, c int, d int) partition by hash (a part_test_int4_ops, b part_test_int4_ops, c part_test_int4_ops, d part_test_int4_ops);
 create table hp_prefix_test_p1 partition of hp_prefix_test for values with (modulus 2, remainder 0);
 create table hp_prefix_test_p2 partition of hp_prefix_test for values with (modulus 2, remainder 1);
#2Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#1)
Re: Yet another issue with step generation in partition pruning

Fujita-san,

Thanks a lot for your time on fixing these multi-column range
partition pruning issues. I'm sorry that I failed to notice the
previous two reports on -bugs for which you committed a fix last week.

On Tue, Aug 4, 2020 at 9:46 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Commit 13838740f fixed some issues with step generation in partition
pruning, but as I mentioned in [1], I noticed that there is yet
another issue: get_steps_using_prefix() assumes that clauses in the
passed-in prefix list are sorted in ascending order of their partition
key numbers, but the caller (i.e., gen_prune_steps_from_opexps())
doesn’t ensure that in the case of range partitioning, leading to an
assertion failure. Here is an example causing such a failure, which
would happen with/without that commit:

create table rp_prefix_test2 (a int, b int, c int) partition by range (a, b, c);
create table rp_prefix_test2_p1 partition of rp_prefix_test2 for
values from (1, 1, 0) to (1, 1, 10);
create table rp_prefix_test2_p2 partition of rp_prefix_test2 for
values from (2, 2, 0) to (2, 2, 10);
select * from rp_prefix_test2 where a <= 1 and b <= 1 and b = 1 and c <= 0;

I don't think we write queries like this, but for this query, the
caller would create the prefix list for the last partition key “c”
{b=1, a<=1, b<=1} (the clauses are not sorted properly!), then calling
get_steps_using_prefix(), which leads to an assertion failure.

That analysis is spot on.

Attached is a patch for fixing this issue.

I have looked at the patch and played around with it using the
regression tests you've added recently. I was not able to find any
results that looked surprising.

Thanks again.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#3Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#2)
Re: Yet another issue with step generation in partition pruning

Amit-san,

On Wed, Aug 5, 2020 at 5:13 PM Amit Langote <amitlangote09@gmail.com> wrote:

Thanks a lot for your time on fixing these multi-column range
partition pruning issues. I'm sorry that I failed to notice the
previous two reports on -bugs for which you committed a fix last week.

No problem.

On Tue, Aug 4, 2020 at 9:46 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Attached is a patch for fixing this issue.

I have looked at the patch and played around with it using the
regression tests you've added recently. I was not able to find any
results that looked surprising.

That's good to hear! Thanks for reviewing! Will push the patch tomorrow.

Best regards,
Etsuro Fujita

#4Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#3)
Re: Yet another issue with step generation in partition pruning

On Thu, Aug 6, 2020 at 12:20 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Will push the patch tomorrow.

Done. (I didn't have time for this, because I was terribly busy with
other stuff.)

Best regards,
Etsuro Fujita

#5Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#4)
Re: Yet another issue with step generation in partition pruning

On Fri, Aug 7, 2020 at 2:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Aug 6, 2020 at 12:20 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Will push the patch tomorrow.

Done. (I didn't have time for this, because I was terribly busy with
other stuff.)

I mean I didn't have time for this *yesterday*.

Best regards,
Etsuro Fujita