Unexpected (wrong?) result querying boolean partitioned table with NULL partition
Hi Hackers,
Is it fair to assume that, given the same data, a partitioned table should
return the same results as a non-partitioned table? If that's true, then I
think I may have stumbled across a case of wrong results on boolean partitioned
tables.
In following example, I think we incorrectly skip the default partition scan:
CREATE TABLE boolpart (a bool) PARTITION BY LIST (a);
CREATE TABLE boolpart_default PARTITION OF boolpart default;
CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true');
CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false');
INSERT INTO boolpart VALUES (true), (false), (null);
EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true;
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on boolpart_f boolpart (cost=0.00..38.10 rows=1405 width=1)
Filter: (a IS NOT TRUE)
(2 rows)
SELECT * FROM boolpart WHERE a IS NOT true;
a
---
f
(1 row)
Compare that to the result of a non-partitioned table:
CREATE TABLE booltab (a bool);
INSERT INTO booltab VALUES (true), (false), (null);
EXPLAIN SELECT * FROM booltab WHERE a IS NOT true;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on booltab (cost=0.00..38.10 rows=1405 width=1)
Filter: (a IS NOT TRUE)
(2 rows)
SELECT * FROM booltab WHERE a IS NOT true;
a
---
f
(2 rows)
I think the issue has to do with assumptions made about boolean test IS NOT
inequality logic which is different from inequality of other operators.
Specifically, "true IS NOT NULL" is not the same as "true<>NULL".
In partition pruning, match_boolean_partition_clause() tries to match partkey
with clause and outputs PARTCLAUSE_MATCH_CLAUSE and an outconst TRUE for
(IS_TRUE or IS_NOT_FALSE) and inversely FALSE for (IS_FALSE or IS_NOT_TRUE).
However, I don't think this gradularity is sufficient for "IS NOT" logic when a
NULL value partition is present.
One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
presumably being a more expensive operation, not equal is not part of the btree
opfamily for bool_ops. So, seems like that won't really fit into the current
partition pruning framework.
Then I realized that the issue is just about adding the default or null
partition in these very particular scenarios. And struct PartitionBoundInfoData
already holds that information. So if we can identify these scenarios and pass
that information into get_matching_partitions() then we can add the necessary
partitions. Attached is a very rough sketch of that idea.
Thoughts? Does this seem like a legit issue? And if so, do either of the
proposed solutions seem reasonable?
Thanks,
David
Attachments:
boolpart_scan_nullpart.patchapplication/octet-stream; name=boolpart_scan_nullpart.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..5b138e1c12 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -122,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
+ bool has_is_not_op;
} GeneratePruningStepsContext;
/* The result of performing one PartitionPruneStep */
@@ -201,7 +202,8 @@ static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *cont
static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
Expr *clause,
Expr *partkey,
- Expr **outconst);
+ Expr **outconst,
+ bool *has_is_not_op);
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
@@ -776,6 +778,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
if (!enable_partition_pruning || clauses == NIL)
return bms_add_range(NULL, 0, rel->nparts - 1);
+ gcontext.has_is_not_op = false;
+
/*
* Process clauses to extract pruning steps that are usable at plan time.
* If the clauses are found to be contradictory, we can return the empty
@@ -807,6 +811,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
context.planstate = NULL;
context.exprcontext = NULL;
context.exprstates = NULL;
+ context.has_is_not_op = gcontext.has_is_not_op;
/* Actual pruning happens here. */
return get_matching_partitions(&context, pruning_steps);
@@ -910,6 +915,10 @@ get_matching_partitions(PartitionPruneContext *context, List *pruning_steps)
result = bms_add_member(result, partindex);
}
+ if (context->has_is_not_op && context->boundinfo->null_index != -1)
+ result = bms_add_member(result, context->boundinfo->null_index);
+ else if (context->has_is_not_op)
+ scan_default |= partition_bound_has_default(context->boundinfo);
/* Add the null and/or default partition if needed and present. */
if (final_result->scan_null)
@@ -1814,7 +1823,7 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
* Recognize specially shaped clauses that match a Boolean partition key.
*/
boolmatchstatus = match_boolean_partition_clause(partopfamily, clause,
- partkey, &expr);
+ partkey, &expr, &context->has_is_not_op);
if (boolmatchstatus == PARTCLAUSE_MATCH_CLAUSE)
{
@@ -3596,7 +3605,7 @@ perform_pruning_combine_step(PartitionPruneContext *context,
*/
static PartClauseMatchStatus
match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
- Expr **outconst)
+ Expr **outconst, bool *has_is_not_op)
{
Expr *leftop;
@@ -3618,6 +3627,10 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
btest->booltesttype == IS_NOT_UNKNOWN)
return PARTCLAUSE_UNSUPPORTED;
+ if (btest->booltesttype == IS_NOT_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ *has_is_not_op = true;
+
leftop = btest->arg;
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c0d6889d47..d8f72a0146 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -59,6 +59,7 @@ typedef struct PartitionPruneContext
PlanState *planstate;
ExprContext *exprcontext;
ExprState **exprstates;
+ bool has_is_not_op;
} PartitionPruneContext;
/*
@@ -70,10 +71,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern int make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index d700c00629..55cc528e55 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1070,20 +1070,25 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t boolpart_2
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(5 rows)
+ -> Seq Scan on boolpart_default boolpart_3
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+(7 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
----------------------------------
- Seq Scan on boolpart_f boolpart
- Filter: (a IS NOT TRUE)
-(2 rows)
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f boolpart_1
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_default boolpart_2
+ Filter: (a IS NOT TRUE)
+(5 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------
- Result
- One-Time Filter: false
+ QUERY PLAN
+--------------------------------------------------
+ Seq Scan on boolpart_default boolpart
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
(2 rows)
explain (costs off) select * from boolpart where a is unknown;
On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote:
Is it fair to assume that, given the same data, a partitioned table should
return the same results as a non-partitioned table?
Yes, and also the same as when enable_partition_pruning is set to off.
CREATE TABLE boolpart (a bool) PARTITION BY LIST (a);
CREATE TABLE boolpart_default PARTITION OF boolpart default;
CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true');
CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false');
INSERT INTO boolpart VALUES (true), (false), (null);EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true;
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on boolpart_f boolpart (cost=0.00..38.10 rows=1405 width=1)
Filter: (a IS NOT TRUE)
(2 rows)SELECT * FROM boolpart WHERE a IS NOT true;
a
---
f
(1 row)Compare that to the result of a non-partitioned table:
CREATE TABLE booltab (a bool);
INSERT INTO booltab VALUES (true), (false), (null);EXPLAIN SELECT * FROM booltab WHERE a IS NOT true;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on booltab (cost=0.00..38.10 rows=1405 width=1)
Filter: (a IS NOT TRUE)
(2 rows)SELECT * FROM booltab WHERE a IS NOT true;
a
---
f
Ouch. That's certainly not correct.
I think the issue has to do with assumptions made about boolean test IS NOT
inequality logic which is different from inequality of other operators.
Specifically, "true IS NOT NULL" is not the same as "true<>NULL".
Yeah, that's wrong.
One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
presumably being a more expensive operation, not equal is not part of the btree
opfamily for bool_ops. So, seems like that won't really fit into the current
partition pruning framework.
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.
Effectively, I think that's the attached patch.
There seems to be a bunch of tests checking this already, all of them
assuming the incorrect plans.
David
Attachments:
fix_partprune_handling_of_NOT_TRUE_and_NOT_FALSE_boolean_quals.patchapplication/octet-stream; name=fix_partprune_handling_of_NOT_TRUE_and_NOT_FALSE_boolean_quals.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..8e36bbe419 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -201,7 +201,8 @@ static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *cont
static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
Expr *clause,
Expr *partkey,
- Expr **outconst);
+ Expr **outconst,
+ bool *noteq);
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
@@ -1809,12 +1810,12 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
Oid partopfamily = part_scheme->partopfamily[partkeyidx],
partcoll = part_scheme->partcollation[partkeyidx];
Expr *expr;
-
+ bool noteq;
/*
* Recognize specially shaped clauses that match a Boolean partition key.
*/
boolmatchstatus = match_boolean_partition_clause(partopfamily, clause,
- partkey, &expr);
+ partkey, &expr, ¬eq);
if (boolmatchstatus == PARTCLAUSE_MATCH_CLAUSE)
{
@@ -1824,7 +1825,7 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
partclause->keyno = partkeyidx;
/* Do pruning with the Boolean equality operator. */
partclause->opno = BooleanEqualOperator;
- partclause->op_is_ne = false;
+ partclause->op_is_ne = noteq;
partclause->expr = expr;
/* We know that expr is of Boolean type. */
partclause->cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
@@ -3596,11 +3597,12 @@ perform_pruning_combine_step(PartitionPruneContext *context,
*/
static PartClauseMatchStatus
match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
- Expr **outconst)
+ Expr **outconst, bool *noteq)
{
Expr *leftop;
*outconst = NULL;
+ *noteq = false;
/*
* Partitioning currently can only use built-in AMs, so checking for
@@ -3623,11 +3625,26 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
leftop = ((RelabelType *) leftop)->arg;
if (equal(leftop, partkey))
- *outconst = (btest->booltesttype == IS_TRUE ||
- btest->booltesttype == IS_NOT_FALSE)
- ? (Expr *) makeBoolConst(true, false)
- : (Expr *) makeBoolConst(false, false);
-
+ {
+ switch (btest->booltesttype)
+ {
+ case IS_NOT_TRUE:
+ *noteq = true;
+ /* fall through */
+ case IS_TRUE:
+ *outconst = (Expr *) makeBoolConst(true, false);
+ break;
+ case IS_NOT_FALSE:
+ *noteq = true;
+ /* fall through */
+ case IS_FALSE:
+ *outconst = (Expr *) makeBoolConst(false, false);
+ break;
+ default:
+ Assert(false); /* hmm? */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
if (*outconst)
return PARTCLAUSE_MATCH_CLAUSE;
}
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index d700c00629..55cc528e55 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1070,20 +1070,25 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t boolpart_2
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(5 rows)
+ -> Seq Scan on boolpart_default boolpart_3
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+(7 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
----------------------------------
- Seq Scan on boolpart_f boolpart
- Filter: (a IS NOT TRUE)
-(2 rows)
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f boolpart_1
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_default boolpart_2
+ Filter: (a IS NOT TRUE)
+(5 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------
- Result
- One-Time Filter: false
+ QUERY PLAN
+--------------------------------------------------
+ Seq Scan on boolpart_default boolpart
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
(2 rows)
explain (costs off) select * from boolpart where a is unknown;
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote:
Is it fair to assume that, given the same data, a partitioned table should
return the same results as a non-partitioned table?Yes, and also the same as when enable_partition_pruning is set to off.
Thanks for making me aware of that GUC.
One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
presumably being a more expensive operation, not equal is not part of the btree
opfamily for bool_ops. So, seems like that won't really fit into the current
partition pruning framework.There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.
Ah, I missed that when I first tried to implement that approach. Indeed, this
seems cleaner. Also, the domain space for boolean partitions is very small, so
any added cost for searching not equal seems negligible.
Thanks,
David
On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.Effectively, I think that's the attached patch.
I think there is a thinko here.
+ switch (btest->booltesttype)
+ {
+ case IS_NOT_TRUE:
+ *noteq = true;
+ /* fall through */
+ case IS_TRUE:
+ *outconst = (Expr *) makeBoolConst(true, false);
+ break;
+ case IS_NOT_FALSE:
+ *noteq = true;
+ /* fall through */
+ case IS_FALSE:
+ *outconst = (Expr *) makeBoolConst(false, false);
+ break;
+ default:
+ Assert(false); /* hmm? */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
The *outconst should be set to true in case IS_NOT_FALSE and set to
false in case IS_NOT_TRUE, something like:
switch (btest->booltesttype)
{
- case IS_NOT_TRUE:
+ case IS_NOT_FALSE:
*noteq = true;
/* fall through */
case IS_TRUE:
*outconst = (Expr *) makeBoolConst(true, false);
break;
- case IS_NOT_FALSE:
+ case IS_NOT_TRUE:
*noteq = true;
/* fall through */
case IS_FALSE:
Thanks
Richard
On Thu, Apr 13, 2023 at 10:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.Effectively, I think that's the attached patch.
I think there is a thinko here.
Sorry. It's my thinko. In cases IS_NOT_TRUE and IS_NOT_FALSE the
op_is_ne is set to true. So the logic in origin patch is right.
BTW, I wonder if we should elog an Error here.
default:
- Assert(false); /* hmm? */
- return PARTCLAUSE_UNSUPPORTED;
+ elog(ERROR, "unrecognized booltesttype: %d",
+ (int) btest->booltesttype);
+ break;
Otherwise the patch looks good to me.
Thanks
Richard
On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote:
BTW, I wonder if we should elog an Error here.
default: - Assert(false); /* hmm? */ - return PARTCLAUSE_UNSUPPORTED; + elog(ERROR, "unrecognized booltesttype: %d", + (int) btest->booltesttype); + break;
I wondered about that, hence my not-so-commitable comment left in there.
My last thoughts were that maybe we should just move the IS_UNKNOWN
and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if
something is missing.
It hardly seems worth keeping the slightly earlier exit for those two
cases. That just amounts to the RelabelType check and is this the
partition key. I doubt IS[_NOT]_UNKNOWN is common enough for us to
warrant contorting the code to make it a few dozen nanoseconds faster.
Having smaller code is probably more of a win, which we'd get if we
didn't add the ERROR you propose.
Otherwise the patch looks good to me.
Thanks for having a look.
David
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
There seems to be a bunch of tests checking this already, all of them
assuming the incorrect plans.
Given that the plan alone wasn't sufficient to catch this error previously,
would it be worthwhile to add some data to the tests to make it abundantly
obvious?
I had noticed that the default partition seems to be an edge case in the code.
Perhaps it's overkill, but would it be worth adding a test where the NULL
partition is not the default?
Thanks,
David
On Thu, 13 Apr 2023 at 15:45, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote:
BTW, I wonder if we should elog an Error here.
default: - Assert(false); /* hmm? */ - return PARTCLAUSE_UNSUPPORTED; + elog(ERROR, "unrecognized booltesttype: %d", + (int) btest->booltesttype); + break;I wondered about that, hence my not-so-commitable comment left in there.
My last thoughts were that maybe we should just move the IS_UNKNOWN
and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if
something is missing.It hardly seems worth keeping the slightly earlier exit for those two
cases. That just amounts to the RelabelType check and is this the
partition key. I doubt IS[_NOT]_UNKNOWN is common enough for us to
warrant contorting the code to make it a few dozen nanoseconds faster.
Having smaller code is probably more of a win, which we'd get if we
didn't add the ERROR you propose.
After having looked at the code in more detail, I don't think it's a
good idea to move the IS_UNKNOWN and IS_NOT_UNKNOWN down into the
switch. Having them tested early means we can return
PARTCLAUSE_UNSUPPORTED even when the clause does not match the current
partition key. If we moved those into the switch statement, then if
the qual didn't match to the partition key, then we'd return
PARTCLAUSE_NOMATCH and we'd maybe waste further effort later trying to
match the same qual to some other partition key.
All I ended up doing was removing the Assert(). I don't really see
the need to add an ERROR. It's not like any other value would cause
the code to misbehave. We'll just return PARTCLAUSE_UNSUPPORTED and
no pruning would get done for that qual. I also struggle to imagine
what possible other values we could ever add to BoolTestType.
After looking a bit deeper and testing a bit more, I found another bug
in match_boolean_partition_clause() around the
equal(negate_clause((Node *) leftop), partkey). The code there just
always set *outconst to a false Const regardless of is_not_clause. I
see the code coverage tool shows that line as untested, so I fixed the
bug and wrote some tests to exercise the code.
As David Kimura suggested, I also added some data to the tables in
question and repeated the same queries again without the EXPLAIN. I
generated the expected output with enable_partition_pruning = off then
put it back on again and saw that the same results are shown. I
considered writing a plpgsql function that we can pass a table name
and a query and it goes and makes a temp table, populates it with the
query with enable_partition_pruning = off then tries again with
pruning on and verifies the results are the same as what's stored in
the temp table. I'll maybe go and do that for master only, it's just a
bit more than what I wanted to do in the back branches.
I've pushed the fix now.
Thanks for the report about this, David, and thank you both for the reviews.
David
David