Yet another issue with step generation in partition pruning
Commit 13838740f fixed some issues with step generation in partition
pruning, but as I mentioned in [1]/messages/by-id/CAPmGK15=c8Q-Ac3ogzZp_d6VsfRYSL2tD8zLwy_WYdrMXQhiCQ@mail.gmail.com, I noticed that there is yet
another issue: get_steps_using_prefix() assumes that clauses in the
passed-in prefix list are sorted in ascending order of their partition
key numbers, but the caller (i.e., gen_prune_steps_from_opexps())
doesn’t ensure that in the case of range partitioning, leading to an
assertion failure. Here is an example causing such a failure, which
would happen with/without that commit:
create table rp_prefix_test2 (a int, b int, c int) partition by range (a, b, c);
create table rp_prefix_test2_p1 partition of rp_prefix_test2 for
values from (1, 1, 0) to (1, 1, 10);
create table rp_prefix_test2_p2 partition of rp_prefix_test2 for
values from (2, 2, 0) to (2, 2, 10);
select * from rp_prefix_test2 where a <= 1 and b <= 1 and b = 1 and c <= 0;
I don't think we write queries like this, but for this query, the
caller would create the prefix list for the last partition key “c”
{b=1, a<=1, b<=1} (the clauses are not sorted properly!), then calling
get_steps_using_prefix(), which leads to an assertion failure.
Attached is a patch for fixing this issue.
Best regards,
Etsuro Fujita
[1]: /messages/by-id/CAPmGK15=c8Q-Ac3ogzZp_d6VsfRYSL2tD8zLwy_WYdrMXQhiCQ@mail.gmail.com
Attachments:
fix-pruning-step-generation.patchapplication/octet-stream; name=fix-pruning-step-generation.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 253c690649..6268623d56 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1362,7 +1362,6 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
List *eq_clauses = btree_clauses[BTEqualStrategyNumber];
List *le_clauses = btree_clauses[BTLessEqualStrategyNumber];
List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber];
- bool pk_has_clauses[PARTITION_MAX_KEYS];
int strat;
/*
@@ -1382,10 +1381,15 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
foreach(lc, btree_clauses[strat])
{
PartClauseInfo *pc = lfirst(lc);
+ ListCell *eq_start;
+ ListCell *le_start;
+ ListCell *ge_start;
ListCell *lc1;
List *prefix = NIL;
List *pc_steps;
bool prefix_valid = true;
+ bool pk_has_clauses;
+ int keyno;
/*
* If this is a clause for the first partition key,
@@ -1410,79 +1414,96 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
continue;
}
- /* (Re-)initialize the pk_has_clauses array */
- Assert(pc->keyno > 0);
- for (i = 0; i < pc->keyno; i++)
- pk_has_clauses[i] = false;
+ eq_start = list_head(eq_clauses);
+ le_start = list_head(le_clauses);
+ ge_start = list_head(ge_clauses);
/*
- * Expressions from = clauses can always be in the
- * prefix, provided they're from an earlier key.
+ * We arrange clauses into prefix in ascending order
+ * of their partition key numbers.
*/
- foreach(lc1, eq_clauses)
+ for (keyno = 0; keyno < pc->keyno; keyno++)
{
- PartClauseInfo *eqpc = lfirst(lc1);
+ pk_has_clauses = false;
- if (eqpc->keyno == pc->keyno)
- break;
- if (eqpc->keyno < pc->keyno)
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ for_each_cell(lc1, eq_clauses, eq_start)
{
- prefix = lappend(prefix, eqpc);
- pk_has_clauses[eqpc->keyno] = true;
- }
- }
+ PartClauseInfo *eqpc = lfirst(lc1);
- /*
- * If we're generating steps for </<= strategy, we can
- * add other <= clauses to the prefix, provided
- * they're from an earlier key.
- */
- if (strat == BTLessStrategyNumber ||
- strat == BTLessEqualStrategyNumber)
- {
- foreach(lc1, le_clauses)
- {
- PartClauseInfo *lepc = lfirst(lc1);
-
- if (lepc->keyno == pc->keyno)
+ if (eqpc->keyno == keyno)
+ {
+ prefix = lappend(prefix, eqpc);
+ pk_has_clauses = true;
+ }
+ else
+ {
+ Assert(eqpc->keyno > keyno);
break;
- if (lepc->keyno < pc->keyno)
+ }
+ }
+ eq_start = lc1;
+
+ /*
+ * If we're generating steps for </<= strategy, we
+ * can add other <= clauses to the prefix,
+ * provided they're from an earlier key.
+ */
+ if (strat == BTLessStrategyNumber ||
+ strat == BTLessEqualStrategyNumber)
+ {
+ for_each_cell(lc1, le_clauses, le_start)
{
- prefix = lappend(prefix, lepc);
- pk_has_clauses[lepc->keyno] = true;
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == keyno)
+ {
+ prefix = lappend(prefix, lepc);
+ pk_has_clauses = true;
+ }
+ else
+ {
+ Assert(lepc->keyno > keyno);
+ break;
+ }
}
+ le_start = lc1;
}
- }
- /*
- * If we're generating steps for >/>= strategy, we can
- * add other >= clauses to the prefix, provided
- * they're from an earlier key.
- */
- if (strat == BTGreaterStrategyNumber ||
- strat == BTGreaterEqualStrategyNumber)
- {
- foreach(lc1, ge_clauses)
+ /*
+ * If we're generating steps for >/>= strategy, we
+ * can add other >= clauses to the prefix,
+ * provided they're from an earlier key.
+ */
+ if (strat == BTGreaterStrategyNumber ||
+ strat == BTGreaterEqualStrategyNumber)
{
- PartClauseInfo *gepc = lfirst(lc1);
-
- if (gepc->keyno == pc->keyno)
- break;
- if (gepc->keyno < pc->keyno)
+ for_each_cell(lc1, ge_clauses, ge_start)
{
- prefix = lappend(prefix, gepc);
- pk_has_clauses[gepc->keyno] = true;
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == keyno)
+ {
+ prefix = lappend(prefix, gepc);
+ pk_has_clauses = true;
+ }
+ else
+ {
+ Assert(gepc->keyno > keyno);
+ break;
+ }
}
+ ge_start = lc1;
}
- }
- /*
- * Check whether every earlier partition key has at
- * least one clause.
- */
- for (i = 0; i < pc->keyno; i++)
- {
- if (!pk_has_clauses[i])
+ /*
+ * If this key has no clauses, prefix is not valid
+ * anymore.
+ */
+ if (!pk_has_clauses)
{
prefix_valid = false;
break;
@@ -2241,6 +2262,9 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
* non-NULL, but they must ensure that prefix contains at least one clause
* for each of the partition keys other than those specified in step_nullkeys
* and step_lastkeyno.
+ *
+ * For both cases, callers must also ensure that clauses in prefix are sorted
+ * in ascending order of their partition key numbers.
*/
static List *
get_steps_using_prefix(GeneratePruningStepsContext *context,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 687cf8c5f4..c9979d6283 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3711,6 +3711,13 @@ explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b
Filter: ((a >= 1) AND (b >= 1) AND (b >= 2) AND (c >= 2) AND (d >= 0))
(2 rows)
+explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b = 2 and c = 2 and d >= 0;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Seq Scan on rp_prefix_test3_p2 rp_prefix_test3
+ Filter: ((a >= 1) AND (b >= 1) AND (d >= 0) AND (b = 2) AND (c = 2))
+(2 rows)
+
create table hp_prefix_test (a int, b int, c int, d int) partition by hash (a part_test_int4_ops, b part_test_int4_ops, c part_test_int4_ops, d part_test_int4_ops);
create table hp_prefix_test_p1 partition of hp_prefix_test for values with (modulus 2, remainder 0);
create table hp_prefix_test_p2 partition of hp_prefix_test for values with (modulus 2, remainder 1);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 93ef9dc1f3..5bc8fb4506 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1080,6 +1080,8 @@ create table rp_prefix_test3_p2 partition of rp_prefix_test3 for values from (2,
-- clauses for the partition key b (ie, b >= 1 and b >= 2)
explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b >= 2 and c >= 2 and d >= 0;
+explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b = 2 and c = 2 and d >= 0;
+
create table hp_prefix_test (a int, b int, c int, d int) partition by hash (a part_test_int4_ops, b part_test_int4_ops, c part_test_int4_ops, d part_test_int4_ops);
create table hp_prefix_test_p1 partition of hp_prefix_test for values with (modulus 2, remainder 0);
create table hp_prefix_test_p2 partition of hp_prefix_test for values with (modulus 2, remainder 1);
Fujita-san,
Thanks a lot for your time on fixing these multi-column range
partition pruning issues. I'm sorry that I failed to notice the
previous two reports on -bugs for which you committed a fix last week.
On Tue, Aug 4, 2020 at 9:46 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Commit 13838740f fixed some issues with step generation in partition
pruning, but as I mentioned in [1], I noticed that there is yet
another issue: get_steps_using_prefix() assumes that clauses in the
passed-in prefix list are sorted in ascending order of their partition
key numbers, but the caller (i.e., gen_prune_steps_from_opexps())
doesn’t ensure that in the case of range partitioning, leading to an
assertion failure. Here is an example causing such a failure, which
would happen with/without that commit:create table rp_prefix_test2 (a int, b int, c int) partition by range (a, b, c);
create table rp_prefix_test2_p1 partition of rp_prefix_test2 for
values from (1, 1, 0) to (1, 1, 10);
create table rp_prefix_test2_p2 partition of rp_prefix_test2 for
values from (2, 2, 0) to (2, 2, 10);
select * from rp_prefix_test2 where a <= 1 and b <= 1 and b = 1 and c <= 0;I don't think we write queries like this, but for this query, the
caller would create the prefix list for the last partition key “c”
{b=1, a<=1, b<=1} (the clauses are not sorted properly!), then calling
get_steps_using_prefix(), which leads to an assertion failure.
That analysis is spot on.
Attached is a patch for fixing this issue.
I have looked at the patch and played around with it using the
regression tests you've added recently. I was not able to find any
results that looked surprising.
Thanks again.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Amit-san,
On Wed, Aug 5, 2020 at 5:13 PM Amit Langote <amitlangote09@gmail.com> wrote:
Thanks a lot for your time on fixing these multi-column range
partition pruning issues. I'm sorry that I failed to notice the
previous two reports on -bugs for which you committed a fix last week.
No problem.
On Tue, Aug 4, 2020 at 9:46 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Attached is a patch for fixing this issue.
I have looked at the patch and played around with it using the
regression tests you've added recently. I was not able to find any
results that looked surprising.
That's good to hear! Thanks for reviewing! Will push the patch tomorrow.
Best regards,
Etsuro Fujita
On Thu, Aug 6, 2020 at 12:20 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Will push the patch tomorrow.
Done. (I didn't have time for this, because I was terribly busy with
other stuff.)
Best regards,
Etsuro Fujita
On Fri, Aug 7, 2020 at 2:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
On Thu, Aug 6, 2020 at 12:20 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Will push the patch tomorrow.
Done. (I didn't have time for this, because I was terribly busy with
other stuff.)
I mean I didn't have time for this *yesterday*.
Best regards,
Etsuro Fujita