Proving IS NOT NULL inference for ScalarArrayOpExpr&#39;s

jtc331@gmail.com

over 7 years ago

In reply to: James Coleman (#1)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

I've become more confident that this approach is correct after
discussions with others on my team and have added the patch to the
open commitfest.

I'm attaching v2 of the patch here with a regression test (that fails
on current master, but is green both with my patch and with current
master if you remove items from the test array to make the array <=
100 items) as well as a comment detailing the reasoning in predtest.c.

Show quoted text

On Sat, Nov 10, 2018 at 4:33 PM James Coleman <jtc331@gmail.com> wrote:

I've recently been investigating improving our plans for queries like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".

Currently the planner generates an index [only] scan so long as the number of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.

This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The rest
of the predicate proving code then becomes unable to infer that foo is not null
and therefore the planner cannot prove that the partial index is correct to
use.

(Please pardon technical details in the below background that may be way off;
I don't have a lot of experience with the Postgres codebase yet, and am still
trying to build a mental model of things.)

At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

Thanks,
James Coleman

jtc331@gmail.com

over 7 years ago

In reply to: James Coleman (#2)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Also, my apologies for top posting; I forgot to remove the old email
before clicking send.

jtc331@gmail.com

over 7 years ago

In reply to: James Coleman (#3)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Is there anything I can do to solicit reviewers for the current CF?

The patch is quite small and should be fairly simple to review.

- James

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#4)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

Is there anything I can do to solicit reviewers for the current CF?

There is no active CF, which is why nothing is happening. We'll
start looking at the 2019-01 items in January. Right now, people
are either on vacation or working on their own patches.

regards, tom lane

dgrowleyml@gmail.com

over 7 years ago

In reply to: James Coleman (#1)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, 11 Nov 2018 at 10:33, James Coleman <jtc331@gmail.com> wrote:

At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

I started looking at this patch and I see that there's no check to
ensure that the IS NOT NULL pred matches the left operand in the IN()
clause.

Here's a demo of why this is bad:

create table t (a int, b int);
create index t_a_is_not_null on t(a) where a is not null;
insert into t values(null,1); -- not indexed
set enable_seqscan=0; -- force index to be used whenever possible
select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- row missing!
a | b
---+---
(0 rows)

explain select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98
,99,100,101); -- should not use t_a_is_not_null index.

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on t (cost=4.42..320.84 rows=1141 width=8)
Recheck Cond: (a IS NOT NULL)
Filter: (b = ANY
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}'::integer[]))
-> Bitmap Index Scan on t_a_is_not_null (cost=0.00..4.13 rows=2249 width=0)
(4 rows)

set enable_seqscan=1;
SET
select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- Correct!
a | b
---+---
| 1
(1 row)

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
if (op_strict(saop->opno) &&
clause_is_strict_for((Node *) linitial(saop->args), subexpr))
return true;
}

Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest. This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I'll mark this as waiting on author.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

dgrowleyml@gmail.com

over 7 years ago

In reply to: David Rowley (#6)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, 13 Jan 2019 at 14:49, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest. This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I just noticed that I accidentally reviewed v1 instead of v2. I see
you found that module. I'll keep this as waiting on author until the
other bug is fixed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

jtc331@gmail.com

over 7 years ago

In reply to: David Rowley (#7)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sat, Jan 12, 2019 at 8:52 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
if (op_strict(saop->opno) &&
clause_is_strict_for((Node *) linitial(saop->args), subexpr))
return true;
}

I've updated the code as well as included the sad path test in
addition to my original happy path test.

Patch v3 attached.

James Coleman

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#8)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v3.patch ]

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.
Otherwise you're up against the fact that the OR of zero boolean
expressions is false not null:

regression=# select null::int = any(array[]::int[]);
?column?
----------
f
(1 row)

While this is doable when the RHS is an array constant or ArrayExpr, I'm
afraid it's likely to be a bit tedious. (Hm, actually predicate_classify
already has logic to extract the number of elements in these cases ...
I wonder if there's any use in trying to share code?)

Minor stylistic quibble: I don't like where you inserted the new code in
clause_is_strict_for, because it renders the comment a few lines above
that unrelatable to the code. I'd put it at the bottom, after the
is_funcclause stanza.

regards, tom lane

#10

jtc331@gmail.com

over 7 years ago

In reply to: Tom Lane (#9)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

Otherwise you're up against the fact that the OR of zero boolean
expressions is false not null:

regression=# select null::int = any(array[]::int[]);
?column?
----------
f
(1 row)

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

While this is doable when the RHS is an array constant or ArrayExpr, I'm
afraid it's likely to be a bit tedious. (Hm, actually predicate_classify
already has logic to extract the number of elements in these cases ...
I wonder if there's any use in trying to share code?)

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items. My new code also has this "problem", but I figured it
was worth getting this logically correct before attempting to optimize
that especially since it already exists in one place.

Minor stylistic quibble: I don't like where you inserted the new code in
clause_is_strict_for, because it renders the comment a few lines above
that unrelatable to the code. I'd put it at the bottom, after the
is_funcclause stanza.

Fixed in this revision.

James Coleman

#11

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#10)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication. If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds. However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four. (ENOCAFFEINE...) In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items.

Huh? list_length() is constant-time.

regards, tom lane

#12

jtc331@gmail.com

over 7 years ago

In reply to: Tom Lane (#11)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication. If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds. However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four. (ENOCAFFEINE...) In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items.

Huh? list_length() is constant-time.

Facepalm. I'd somehow had it stuck in my head that we actually
iterated the list to count.

James Coleman

#13

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#12)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones. I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

regards, tom lane

#14

jtc331@gmail.com

over 7 years ago

In reply to: Tom Lane (#13)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones. I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

At one point I was getting all 'f' instead of all 't' for those, but
now I can't reproduce it going from either master to my patch or to my
patch without the element count checks, so I'm not sure how/when I was
getting that anymore.

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

James Coleman

#15

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#14)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

I can't get terribly excited about them ... if we were changing their
behavior, or if there were a bug discovered here, then I'd think
differently. But I'm not a fan of adding regression test cycles
without a reason.

regards, tom lane

#16

jtc331@gmail.com

over 7 years ago

In reply to: Tom Lane (#13)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

The more that I look at this I'm wondering if there aren't two things
here: it seems that the existing patch (with the code that excludes
empty arrays) is likely useful on its own for cases I hadn't
originally envisioned (or tested). Specifically it seems to allow us
to conclude the result will be null if a field is known to be null. I
think I'll try to add a test specific to this, but I'm not immediately
sure I know a case where that matters. If anyone has an idea on what
direction to head with this, I'd be happy to code it up.

For the original goal of "truth of the ScalarArrayOp implies truth of
the IS NOT NULL clause", it seems to me the proper place is
predicate_implied_by_simple_clause where we already have a place to
specifically work with IS NOT NULL expressions. That will allow this
to be more targeted, work for empty arrays as well, and not
potentially introduce an optimizer bug for strictness with empty
arrays.

I'm attaching an updated patch that does that. I've also added the
"parallel" case in predicate_refuted_by_simple_clause, but I haven't
added a test for that yet. I also would like to add a test for the sad
path to parallel the happy path computed array/cte test.

While I add that though I wanted to get this updated version out to
get feedback on the approach.

James Coleman

#17

dgrowleyml@gmail.com

over 7 years ago

In reply to: James Coleman (#16)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, 15 Jan 2019 at 12:24, James Coleman <jtc331@gmail.com> wrote:

While I add that though I wanted to get this updated version out to
get feedback on the approach.

I had a look at this and there's a couple of things I see:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
  /* foo IS NOT NULL refutes foo IS NULL */
  if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL. The NullTest condition one below does
this.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand. Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand. This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101); I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18

jtc331@gmail.com

over 7 years ago

In reply to: David Rowley (#17)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 12:47 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
/* foo IS NOT NULL refutes foo IS NULL */
if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL. The NullTest condition one below does
this.

Fixed in my local copy. I also did the same in
predicate_implied_by_simple_clause since it seems to have the same
potential issue.

After sleeping on it and looking at it again this morning, I also
realized I need to exclude weak implication in
predicate_refuted_by_simple_clause.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand. Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand. This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101); I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

Are you suggesting a different test than clause_is_strict_for to
verify the saop LHS is the same as the null test's arg? I suppose we
could use "equal()" instead?

I've added several tests, and noticed a few things:

1. The current code doesn't result in "strongly_implied_by = t" for
the "(x + x) is not null" case, but it does result in "s_i_holds = t".
This doesn't change if I switch to using "equal()" as mentioned above.

2. The tests I have for refuting "x is null" show that w_r_holds as
well as s_r_holds. I'd only expected the latter, since only
"strongly_refuted_by = t".

3. The tests I have for refuting "(x + x) is null" show that both
s_r_holds and w_r_holds. I'd expected these to be false.

I'm attaching the current version of the patch with the new tests, but
I'm not sure I understand the *_holds results mentioned above. Are
they an artifact of the data under test? Or do they suggest a
remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
understand the requirements for strongly/weakly_implied/refuted_by.

James Coleman

#19

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#18)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

I'm attaching the current version of the patch with the new tests, but
I'm not sure I understand the *_holds results mentioned above. Are
they an artifact of the data under test? Or do they suggest a
remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
understand the requirements for strongly/weakly_implied/refuted_by.

IIRC, the "foo_holds" outputs mean "the foo relationship appears
to hold for these expressions across all tested inputs", for instance
s_i_holds means "there were no tested cases where A is true and B is not
true". The implied_by/refuted_by outputs mean "predtest.c claims it can
prove this". Obviously, a claimed proof for a relationship that fails
experimentally would be big trouble. The other way around just means
that either the proof rules are inadequate to prove this particular case,
or the set of test values for the expressions don't expose the fact that
the relationship doesn't hold.

Now, if we *expected* that predtest.c should be able to prove something,
failure to do so would be a bug, but it's not a bug if we know it's
incapable of making that proof. The second explanation might represent
a bug in the test case.

I added the foo_holds columns just as a sort of cross-check on the
test cases themselves, they don't test anything about the predtest
code.

regards, tom lane

#20

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: James Coleman (#18)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v6.patch ]

I quite dislike taking the responsibility out of clause_is_strict_for
and putting it in the callers. Aside from requiring duplicative code,
it means that this fails to prove anything for recursive situations
(i.e., where the ScalarArrayOp appears one or more levels down in a
clause examined by clause_is_strict_for).

If the behavior needs to vary depending on proof rule, which it
looks like it does, the way to handle that is to add flag argument(s)
to clause_is_strict_for.

I'm also not happy about the lack of comments justifying the proof
rules -- eg, it's far from obvious why you need to restrict one
case to !weak but not the other.

regards, tom lane

#21

dgrowleyml@gmail.com

over 7 years ago

In reply to: James Coleman (#18)

#22