Proving IS NOT NULL inference for ScalarArrayOpExpr's

Started by James Colemanabout 7 years ago41 messages

jtc331@gmail.com

about 7 years ago

1 attachment(s)

I've recently been investigating improving our plans for queries like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".

Currently the planner generates an index [only] scan so long as the number
of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.

This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The
rest
of the predicate proving code then becomes unable to infer that foo is not
null
and therefore the planner cannot prove that the partial index is correct to
use.

(Please pardon technical details in the below background that may be way
off;
I don't have a lot of experience with the Postgres codebase yet, and am
still
trying to build a mental model of things.)

At first I was imagining having the parse keep track of whether an array
const
expr contained any nulls and perhaps adding generated quals (in an
equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test
cases,
I realized that at least in many normal op cases we can safely infer that
foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case
where
the above doesn't hold true, but I can't immediately think of any, and
indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

Attachments:

saop_is_not_null-v1.patchapplication/octet-stream; name=saop_is_not_null-v1.patchDownload

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index cba6608ce2..6d6c0faf4f 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1339,6 +1339,12 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 		}
 		return false;
 	}
+	if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno))
+			return true;
+	}
 	if (is_funcclause(clause) &&
 		func_strict(((FuncExpr *) clause)->funcid))
 	{

James Coleman

jtc331@gmail.com

about 7 years ago

In reply to: James Coleman (#1)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

I've become more confident that this approach is correct after
discussions with others on my team and have added the patch to the
open commitfest.

I'm attaching v2 of the patch here with a regression test (that fails
on current master, but is green both with my patch and with current
master if you remove items from the test array to make the array <=
100 items) as well as a comment detailing the reasoning in predtest.c.

Show quoted text

On Sat, Nov 10, 2018 at 4:33 PM James Coleman <jtc331@gmail.com> wrote:

I've recently been investigating improving our plans for queries like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".

Currently the planner generates an index [only] scan so long as the number of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.

This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The rest
of the predicate proving code then becomes unable to infer that foo is not null
and therefore the planner cannot prove that the partial index is correct to
use.

(Please pardon technical details in the below background that may be way off;
I don't have a lot of experience with the Postgres codebase yet, and am still
trying to build a mental model of things.)

At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

Thanks,
James Coleman

Attachments:

saop_is_not_null-v2.patchapplication/octet-stream; name=saop_is_not_null-v2.patchDownload

commit a1133e4dbf7e7d7f575a64e31121d1d3d03998ec
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Prove IS NOT NULL inference for large arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). However all scalar array ops IS NOT NULL can be inferred
    trivially without decomposing into AND/OR chains.
    
    We teach predtest to check for strict operators in ScalarArrayOpExpr
    nodes instead of relying on checking strictness of operators in AND/OR
    chains.
    
    This allows the planner to use partial indexes of the form "WHERE foo IS
    NOT NULL" when the query's WHERE clause involves a scalar array op like
    "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 446207de30..1901de7470 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1340,6 +1340,18 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 		}
 		return false;
 	}
+  /*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * scalar array ops separately (this case will occur when the array has
+	 * more than MAX_SAOP_ARRAY_SIZE items).
+	 */
+	if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno))
+			return true;
+	}
 	if (is_funcclause(clause) &&
 		func_strict(((FuncExpr *) clause)->funcid))
 	{
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..e290f3bfb5 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,25 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+select * from test_predtest($$
+-- We want to test an array longer than MAX_SAOP_ARRAY_SIZE so that we're
+-- not relying on predtest turning the array op into set of OR quals. We
+-- also want to include at least one null value.
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..e38502b375 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,16 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+select * from test_predtest($$
+-- We want to test an array longer than MAX_SAOP_ARRAY_SIZE so that we're
+-- not relying on predtest turning the array op into set of OR quals. We
+-- also want to include at least one null value.
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);

James Coleman

jtc331@gmail.com

about 7 years ago

In reply to: James Coleman (#2)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Also, my apologies for top posting; I forgot to remove the old email
before clicking send.

James Coleman

jtc331@gmail.com

about 7 years ago

In reply to: James Coleman (#3)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Is there anything I can do to solicit reviewers for the current CF?

The patch is quite small and should be fairly simple to review.

- James

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: James Coleman (#4)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

Is there anything I can do to solicit reviewers for the current CF?

There is no active CF, which is why nothing is happening. We'll
start looking at the 2019-01 items in January. Right now, people
are either on vacation or working on their own patches.

regards, tom lane

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#1)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, 11 Nov 2018 at 10:33, James Coleman <jtc331@gmail.com> wrote:

At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

I started looking at this patch and I see that there's no check to
ensure that the IS NOT NULL pred matches the left operand in the IN()
clause.

Here's a demo of why this is bad:

create table t (a int, b int);
create index t_a_is_not_null on t(a) where a is not null;
insert into t values(null,1); -- not indexed
set enable_seqscan=0; -- force index to be used whenever possible
select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- row missing!
a | b
---+---
(0 rows)

explain select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98
,99,100,101); -- should not use t_a_is_not_null index.

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on t (cost=4.42..320.84 rows=1141 width=8)
Recheck Cond: (a IS NOT NULL)
Filter: (b = ANY
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}'::integer[]))
-> Bitmap Index Scan on t_a_is_not_null (cost=0.00..4.13 rows=2249 width=0)
(4 rows)

set enable_seqscan=1;
SET
select * from t where b
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- Correct!
a | b
---+---
| 1
(1 row)

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
if (op_strict(saop->opno) &&
clause_is_strict_for((Node *) linitial(saop->args), subexpr))
return true;
}

Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest. This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I'll mark this as waiting on author.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: David Rowley (#6)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, 13 Jan 2019 at 14:49, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest. This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I just noticed that I accidentally reviewed v1 instead of v2. I see
you found that module. I'll keep this as waiting on author until the
other bug is fixed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: David Rowley (#7)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sat, Jan 12, 2019 at 8:52 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
if (op_strict(saop->opno) &&
clause_is_strict_for((Node *) linitial(saop->args), subexpr))
return true;
}

I've updated the code as well as included the sad path test in
addition to my original happy path test.

Patch v3 attached.

James Coleman

Attachments:

saop_is_not_null-v3.patchapplication/octet-stream; name=saop_is_not_null-v3.patchDownload

commit c2ab637f8a1094bd9a9dde5b1957a231e388534a
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Prove IS NOT NULL inference for large arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). However all scalar array ops IS NOT NULL can be inferred
    trivially without decomposing into AND/OR chains.
    
    We teach predtest to check for strict operators in ScalarArrayOpExpr
    nodes instead of relying on checking strictness of operators in AND/OR
    chains.
    
    This allows the planner to use partial indexes of the form "WHERE foo IS
    NOT NULL" when the query's WHERE clause involves a scalar array op like
    "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..18be276d74 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1340,6 +1340,19 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 		}
 		return false;
 	}
+  /*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * scalar array ops separately (this case will occur when the array has
+	 * more than MAX_SAOP_ARRAY_SIZE items).
+	 */
+	if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+			clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+			return true;
+	}
 	if (is_funcclause(clause) &&
 		func_strict(((FuncExpr *) clause)->funcid))
 	{
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..a0010f05cc 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,45 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..13ef296b34 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,27 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#8)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v3.patch ]

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.
Otherwise you're up against the fact that the OR of zero boolean
expressions is false not null:

regression=# select null::int = any(array[]::int[]);
?column?
----------
f
(1 row)

While this is doable when the RHS is an array constant or ArrayExpr, I'm
afraid it's likely to be a bit tedious. (Hm, actually predicate_classify
already has logic to extract the number of elements in these cases ...
I wonder if there's any use in trying to share code?)

Minor stylistic quibble: I don't like where you inserted the new code in
clause_is_strict_for, because it renders the comment a few lines above
that unrelatable to the code. I'd put it at the bottom, after the
is_funcclause stanza.

regards, tom lane

#10

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#9)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

Otherwise you're up against the fact that the OR of zero boolean
expressions is false not null:

regression=# select null::int = any(array[]::int[]);
?column?
----------
f
(1 row)

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

While this is doable when the RHS is an array constant or ArrayExpr, I'm
afraid it's likely to be a bit tedious. (Hm, actually predicate_classify
already has logic to extract the number of elements in these cases ...
I wonder if there's any use in trying to share code?)

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items. My new code also has this "problem", but I figured it
was worth getting this logically correct before attempting to optimize
that especially since it already exists in one place.

Minor stylistic quibble: I don't like where you inserted the new code in
clause_is_strict_for, because it renders the comment a few lines above
that unrelatable to the code. I'd put it at the bottom, after the
is_funcclause stanza.

Fixed in this revision.

James Coleman

Attachments:

saop_is_not_null-v4.patchapplication/octet-stream; name=saop_is_not_null-v4.patchDownload

commit 012086a88c5f206b02747899e5cb86ce41459e53
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Prove IS NOT NULL inference for large arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). However all scalar array ops IS NOT NULL can be inferred
    trivially without decomposing into AND/OR chains.
    
    We teach predtest to check for strict operators in ScalarArrayOpExpr
    nodes instead of relying on checking strictness of operators in AND/OR
    chains.
    
    This allows the planner to use partial indexes of the form "WHERE foo IS
    NOT NULL" when the query's WHERE clause involves a scalar array op like
    "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..4968136d79 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1351,6 +1351,42 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 		return false;
 	}
 
+	/*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * scalar array ops separately (this case will occur when the array has
+	 * more than MAX_SAOP_ARRAY_SIZE items).
+	 */
+	if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+			clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+		{
+			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int nelems = 0;
+
+			if (arraynode && IsA(arraynode, Const) &&
+				!((Const *) arraynode)->constisnull)
+			{
+				ArrayType  *arrayval;
+				arrayval = DatumGetArrayTypeP(((Const *) arraynode)->constvalue);
+				nelems = ArrayGetNItems(ARR_NDIM(arrayval), ARR_DIMS(arrayval));
+			}
+			else if (arraynode && IsA(arraynode, ArrayExpr) &&
+					 !((ArrayExpr *) arraynode)->multidims)
+				nelems = list_length(((ArrayExpr *) arraynode)->elements);
+
+			/*
+			 * Empty AND/OR groups resolve to true or false rather than null,
+			 * so in addition to verifying the operation is otherwise strict,
+			 * we also have to verify we have a non-empty array.
+			 */
+			if (nelems > 0)
+				return true;
+		}
+	}
+
 	return false;
 }
 
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..7210845b77 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,97 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array[
+    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+    29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+    54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+    79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+  ]
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+-- Verify ScalarArrayOpExpr empty array base cases since they resolve
+-- to true or false instead of null.
+select * from test_predtest($$
+select x is not null, x = any(array[]::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | t
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is null, x = any(array[]::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | t
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..069521421c 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,53 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array[
+    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+    29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+    54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+    79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+  ]
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- Verify ScalarArrayOpExpr empty array base cases since they resolve
+-- to true or false instead of null.
+select * from test_predtest($$
+select x is not null, x = any(array[]::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, x = any(array[]::int[])
+from integers
+$$);
+

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#10)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication. If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds. However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four. (ENOCAFFEINE...) In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items.

Huh? list_length() is constant-time.

regards, tom lane

#12

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#11)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication. If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds. However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four. (ENOCAFFEINE...) In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items.

Huh? list_length() is constant-time.

Facepalm. I'd somehow had it stuck in my head that we actually
iterated the list to count.

James Coleman

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#12)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones. I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

regards, tom lane

#14

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#13)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones. I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

At one point I was getting all 'f' instead of all 't' for those, but
now I can't reproduce it going from either master to my patch or to my
patch without the element count checks, so I'm not sure how/when I was
getting that anymore.

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

James Coleman

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#14)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

I can't get terribly excited about them ... if we were changing their
behavior, or if there were a bug discovered here, then I'd think
differently. But I'm not a fan of adding regression test cycles
without a reason.

regards, tom lane

#16

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#13)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

The more that I look at this I'm wondering if there aren't two things
here: it seems that the existing patch (with the code that excludes
empty arrays) is likely useful on its own for cases I hadn't
originally envisioned (or tested). Specifically it seems to allow us
to conclude the result will be null if a field is known to be null. I
think I'll try to add a test specific to this, but I'm not immediately
sure I know a case where that matters. If anyone has an idea on what
direction to head with this, I'd be happy to code it up.

For the original goal of "truth of the ScalarArrayOp implies truth of
the IS NOT NULL clause", it seems to me the proper place is
predicate_implied_by_simple_clause where we already have a place to
specifically work with IS NOT NULL expressions. That will allow this
to be more targeted, work for empty arrays as well, and not
potentially introduce an optimizer bug for strictness with empty
arrays.

I'm attaching an updated patch that does that. I've also added the
"parallel" case in predicate_refuted_by_simple_clause, but I haven't
added a test for that yet. I also would like to add a test for the sad
path to parallel the happy path computed array/cte test.

While I add that though I wanted to get this updated version out to
get feedback on the approach.

James Coleman

Attachments:

saop_is_not_null-v5.patchapplication/octet-stream; name=saop_is_not_null-v5.patchDownload

commit 903ed4feebd1a30f0b888555347a6782d4d8ff1e
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Prove IS NOT NULL inference for large arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). However all scalar array ops IS NOT NULL can be inferred
    trivially without decomposing into AND/OR chains.
    
    We teach predtest to check for strict operators in ScalarArrayOpExpr
    nodes instead of relying on checking strictness of operators in AND/OR
    chains.
    
    This allows the planner to use partial indexes of the form "WHERE foo IS
    NOT NULL" when the query's WHERE clause involves a scalar array op like
    "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..719fd1b8b5 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1132,6 +1132,21 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			/* strictness of clause for foo implies foo IS NOT NULL */
 			if (clause_is_strict_for(clause, (Node *) ntest->arg))
 				return true;
+
+			/*
+			 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+			 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+			 * scalar array ops separately (this case will occur when the array has
+			 * more than MAX_SAOP_ARRAY_SIZE items).
+			 */
+			if (IsA(clause, ScalarArrayOpExpr))
+			{
+				ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+				Node *subexpr = (Node *) ntest->arg;
+				if (op_strict(saop->opno) &&
+					clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+					return true;
+			}
 		}
 		return false;			/* we can't succeed below... */
 	}
@@ -1196,6 +1211,21 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 		if (clause_is_strict_for(clause, (Node *) isnullarg))
 			return true;
 
+		/*
+		 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+		 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+		 * scalar array ops separately (this case will occur when the array has
+		 * more than MAX_SAOP_ARRAY_SIZE items).
+		 */
+		if (IsA(clause, ScalarArrayOpExpr))
+		{
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+			Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+			if (op_strict(saop->opno) &&
+				clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+				return true;
+		}
+
 		/* foo IS NOT NULL refutes foo IS NULL */
 		if (clause && IsA(clause, NullTest) &&
 			((NullTest *) clause)->nulltesttype == IS_NOT_NULL &&
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..cea5cdacd3 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,67 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array[
+    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+    29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+    54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+    79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+  ]
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..12ed8e2d8a 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,41 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array[
+    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+    29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+    54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+    79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+  ]
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+

#17

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#16)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, 15 Jan 2019 at 12:24, James Coleman <jtc331@gmail.com> wrote:

While I add that though I wanted to get this updated version out to
get feedback on the approach.

I had a look at this and there's a couple of things I see:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
  /* foo IS NOT NULL refutes foo IS NULL */
  if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL. The NullTest condition one below does
this.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand. Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand. This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101); I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: David Rowley (#17)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 12:47 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
/* foo IS NOT NULL refutes foo IS NULL */
if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL. The NullTest condition one below does
this.

Fixed in my local copy. I also did the same in
predicate_implied_by_simple_clause since it seems to have the same
potential issue.

After sleeping on it and looking at it again this morning, I also
realized I need to exclude weak implication in
predicate_refuted_by_simple_clause.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand. Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand. This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101); I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

Are you suggesting a different test than clause_is_strict_for to
verify the saop LHS is the same as the null test's arg? I suppose we
could use "equal()" instead?

I've added several tests, and noticed a few things:

1. The current code doesn't result in "strongly_implied_by = t" for
the "(x + x) is not null" case, but it does result in "s_i_holds = t".
This doesn't change if I switch to using "equal()" as mentioned above.

2. The tests I have for refuting "x is null" show that w_r_holds as
well as s_r_holds. I'd only expected the latter, since only
"strongly_refuted_by = t".

3. The tests I have for refuting "(x + x) is null" show that both
s_r_holds and w_r_holds. I'd expected these to be false.

I'm attaching the current version of the patch with the new tests, but
I'm not sure I understand the *_holds results mentioned above. Are
they an artifact of the data under test? Or do they suggest a
remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
understand the requirements for strongly/weakly_implied/refuted_by.

James Coleman

Attachments:

saop_is_not_null-v6.patchapplication/octet-stream; name=saop_is_not_null-v6.patchDownload

commit 14edda8344f24ebd4ecf9672cdc9b61bfec18ace
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Prove IS NOT NULL inference for large arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). However all scalar array ops IS NOT NULL can be inferred
    trivially without decomposing into AND/OR chains.
    
    We teach predtest to check for strict operators in ScalarArrayOpExpr
    nodes instead of relying on checking strictness of operators in AND/OR
    chains.
    
    This allows the planner to use partial indexes of the form "WHERE foo IS
    NOT NULL" when the query's WHERE clause involves a scalar array op like
    "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..671b8d8c45 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1132,6 +1132,21 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			/* strictness of clause for foo implies foo IS NOT NULL */
 			if (clause_is_strict_for(clause, (Node *) ntest->arg))
 				return true;
+
+			/*
+			 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+			 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+			 * scalar array ops separately (this case will occur when the array has
+			 * more than MAX_SAOP_ARRAY_SIZE items).
+			 */
+			if (clause && IsA(clause, ScalarArrayOpExpr))
+			{
+				ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+				Node *subexpr = (Node *) ntest->arg;
+				if (op_strict(saop->opno) &&
+					clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+					return true;
+			}
 		}
 		return false;			/* we can't succeed below... */
 	}
@@ -1196,6 +1211,21 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 		if (clause_is_strict_for(clause, (Node *) isnullarg))
 			return true;
 
+		/*
+		 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+		 * to arrays with at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+		 * scalar array ops separately (this case will occur when the array has
+		 * more than MAX_SAOP_ARRAY_SIZE items).
+		 */
+		if (clause && !weak && IsA(clause, ScalarArrayOpExpr))
+		{
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+			Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+			if (op_strict(saop->opno) &&
+				clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+				return true;
+		}
+
 		/* foo IS NOT NULL refutes foo IS NULL */
 		if (clause && IsA(clause, NullTest) &&
 			((NullTest *) clause)->nulltesttype == IS_NOT_NULL &&
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..49323818ab 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,226 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..830f41ee3c 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,119 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning
+-- the array op into set of OR quals. We also need to include at least
+-- one null value to demonstrate strict operators are checked properly.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+

#19

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#18)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

I'm attaching the current version of the patch with the new tests, but
I'm not sure I understand the *_holds results mentioned above. Are
they an artifact of the data under test? Or do they suggest a
remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
understand the requirements for strongly/weakly_implied/refuted_by.

IIRC, the "foo_holds" outputs mean "the foo relationship appears
to hold for these expressions across all tested inputs", for instance
s_i_holds means "there were no tested cases where A is true and B is not
true". The implied_by/refuted_by outputs mean "predtest.c claims it can
prove this". Obviously, a claimed proof for a relationship that fails
experimentally would be big trouble. The other way around just means
that either the proof rules are inadequate to prove this particular case,
or the set of test values for the expressions don't expose the fact that
the relationship doesn't hold.

Now, if we *expected* that predtest.c should be able to prove something,
failure to do so would be a bug, but it's not a bug if we know it's
incapable of making that proof. The second explanation might represent
a bug in the test case.

I added the foo_holds columns just as a sort of cross-check on the
test cases themselves, they don't test anything about the predtest
code.

regards, tom lane

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#18)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v6.patch ]

I quite dislike taking the responsibility out of clause_is_strict_for
and putting it in the callers. Aside from requiring duplicative code,
it means that this fails to prove anything for recursive situations
(i.e., where the ScalarArrayOp appears one or more levels down in a
clause examined by clause_is_strict_for).

If the behavior needs to vary depending on proof rule, which it
looks like it does, the way to handle that is to add flag argument(s)
to clause_is_strict_for.

I'm also not happy about the lack of comments justifying the proof
rules -- eg, it's far from obvious why you need to restrict one
case to !weak but not the other.

regards, tom lane

#21

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#18)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Wed, 16 Jan 2019 at 03:33, James Coleman <jtc331@gmail.com> wrote:

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand. Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand. This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101); I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

Are you suggesting a different test than clause_is_strict_for to
verify the saop LHS is the same as the null test's arg? I suppose we
could use "equal()" instead?

I wasn't suggesting any code changes. I just thought the code was
sufficiently hard to understand to warrant some additional tests that
ensure we don't assume that if the int4 column x is not null that also
x+x is not null. Only the reverse can be implied since int4pl is
strict.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#22

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#20)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v6.patch ]

I quite dislike taking the responsibility out of clause_is_strict_for
and putting it in the callers. Aside from requiring duplicative code,
it means that this fails to prove anything for recursive situations
(i.e., where the ScalarArrayOp appears one or more levels down in a
clause examined by clause_is_strict_for).

If the behavior needs to vary depending on proof rule, which it
looks like it does, the way to handle that is to add flag argument(s)
to clause_is_strict_for.

The reason I moved it was that we're no longer just proving
strictness, so it seemed odd to put it in a function specifically
named to be about proving strictness.

If you'd prefer an argument like "bool allow_false" or "bool
check_saop_implies_is_null" I'm happy to make that change, but it
seems generally unhelpful to change the function implementation to
contradict the name (the empty array case that returns false even then
the lhs is null).

I'm also not happy about the lack of comments justifying the proof
rules -- eg, it's far from obvious why you need to restrict one
case to !weak but not the other.

Both implication and refutation have the !weak check; it's just that
the check in the implication function was already present.

I'll add more comments inline for the proof rules, but from what I can
tell only strong implication holds for implication of IS NOT NULL. A
sample case disproving weak implication is "select null is not null,
null::int = any(array[null]::int[])" which results in "false, null",
so non-falsity does not imply non-falsity.

However I'm currently unable to come up with a case for refutation
such that truth of a ScalarArrayOp refutes IS NULL, so I'll update
code and tests for that. That also takes care of my point:

James Coleman <jtc331@gmail.com> writes:

2. The tests I have for refuting "x is null" show that w_r_holds as
well as s_r_holds. I'd only expected the latter, since only
"strongly_refuted_by = t".

But it still leaves my questions (1) and (3).

James Coleman

#23

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: David Rowley (#21)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 4:36 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:

I wasn't suggesting any code changes. I just thought the code was
sufficiently hard to understand to warrant some additional tests that
ensure we don't assume that if the int4 column x is not null that also
x+x is not null. Only the reverse can be implied since int4pl is
strict.

At the risk of missing something obvious, I'm not sure I see a case
where "x is not null" does not imply "(x + x) is not null", at least
for integers. Since an integer + an integer results in an integer,
then it must imply the addition of itself is not null also?

This is the root of the questions I had:

James Coleman <jtc331@gmail.com> writes:

1. The current code doesn't result in "strongly_implied_by = t" for
the "(x + x) is not null" case, but it does result in "s_i_holds = t".
This doesn't change if I switch to using "equal()" as mentioned above.

3. The tests I have for refuting "(x + x) is null" show that both
s_r_holds and w_r_holds. I'd expected these to be false.

James Coleman

#24

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#23)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Wed, 16 Jan 2019 at 14:05, James Coleman <jtc331@gmail.com> wrote:

At the risk of missing something obvious, I'm not sure I see a case
where "x is not null" does not imply "(x + x) is not null", at least
for integers. Since an integer + an integer results in an integer,
then it must imply the addition of itself is not null also?

A strict function guarantees it will return NULL on any NULL input.
This does not mean it can't return NULL on a non-NULL input.

While int4pl might do what you want, some other strict function might
not. A simple example would be a strict function that decided to
return NULL when the two ints combined overflowed int.

The docs [1]https://www.postgresql.org/docs/devel/sql-createfunction.html define STRICT as:

"RETURNS NULL ON NULL INPUT or STRICT indicates that the function
always returns null whenever any of its arguments are null. If this
parameter is specified, the function is not executed when there are
null arguments; instead a null result is assumed automatically."

[1]: https://www.postgresql.org/docs/devel/sql-createfunction.html

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: David Rowley (#24)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 8:14 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:

On Wed, 16 Jan 2019 at 14:05, James Coleman <jtc331@gmail.com> wrote:

At the risk of missing something obvious, I'm not sure I see a case
where "x is not null" does not imply "(x + x) is not null", at least
for integers. Since an integer + an integer results in an integer,
then it must imply the addition of itself is not null also?

A strict function guarantees it will return NULL on any NULL input.
This does not mean it can't return NULL on a non-NULL input.

While int4pl might do what you want, some other strict function might
not. A simple example would be a strict function that decided to
return NULL when the two ints combined overflowed int.

Yes, the claim about not all strict functions guarantee this (like
int4pl) made sense.

Is your preference in this kind of case to comment the code and/or
tests but stick with int4pl even though it doesn't demonstrate the
"problem", or try to engineer a different test case such that the
*_holds results in the tests don't seem to imply we could prove more
things than we do?

James Coleman

#26

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#25)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Wed, 16 Jan 2019 at 14:29, James Coleman <jtc331@gmail.com> wrote:

On Tue, Jan 15, 2019 at 8:14 PM David Rowley

While int4pl might do what you want, some other strict function might
not. A simple example would be a strict function that decided to
return NULL when the two ints combined overflowed int.

Yes, the claim about not all strict functions guarantee this (like
int4pl) made sense.

Is your preference in this kind of case to comment the code and/or
tests but stick with int4pl even though it doesn't demonstrate the
"problem", or try to engineer a different test case such that the
*_holds results in the tests don't seem to imply we could prove more
things than we do?

I think using x+x or whatever is fine. I doubt there's a need to
invent some new function that returns NULL on non-NULL input. The
code you're adding has no idea about the difference between the two.
It has no way to know that anyway.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#27

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#22)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Tue, Jan 15, 2019 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I quite dislike taking the responsibility out of clause_is_strict_for
and putting it in the callers.

The reason I moved it was that we're no longer just proving
strictness, so it seemed odd to put it in a function specifically
named to be about proving strictness.

Well, as I said upthread, it seems like we need to think a bit more
carefully about what it is that clause_is_strict_for is testing ---
and if that ends up finding that some other name is more apposite,
I'd not have any hesitation about renaming it. But we're really
missing a bet if the ScalarArrayOp-specific check isn't inside that
recursive check of an "atomic" expression's properties. The
routines above there only recurse through things that act like
AND or OR, but clause_is_strict_for is capable of descending
through other stuff as long as it's strict in some sense. What
we need to be clear about here is exactly what that sense is.

regards, tom lane

#28

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: David Rowley (#26)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

David Rowley <david.rowley@2ndquadrant.com> writes:

On Wed, 16 Jan 2019 at 14:29, James Coleman <jtc331@gmail.com> wrote:

Is your preference in this kind of case to comment the code and/or
tests but stick with int4pl even though it doesn't demonstrate the
"problem", or try to engineer a different test case such that the
*_holds results in the tests don't seem to imply we could prove more
things than we do?

I think using x+x or whatever is fine. I doubt there's a need to
invent some new function that returns NULL on non-NULL input. The
code you're adding has no idea about the difference between the two.
It has no way to know that anyway.

Yeah, I never intended that the *_holds results would be "exact" in
every test case. They're mostly there to catch egregious errors in
test construction. Since the code we're actually testing doesn't get
to see the input or output values of the test cases, it's not really
useful to insist that the test cases exercise every possible combination
of input/output values for the given expressions.

regards, tom lane

#29

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#27)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 15, 2019 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, as I said upthread, it seems like we need to think a bit more
carefully about what it is that clause_is_strict_for is testing ---
and if that ends up finding that some other name is more apposite,
I'd not have any hesitation about renaming it. But we're really
missing a bet if the ScalarArrayOp-specific check isn't inside that
recursive check of an "atomic" expression's properties. The
routines above there only recurse through things that act like
AND or OR, but clause_is_strict_for is capable of descending
through other stuff as long as it's strict in some sense. What
we need to be clear about here is exactly what that sense is.

All right, I'll look over all of the other callers and see what makes
sense as far as naming (or perhaps consider a new parallel function;
unsure at this point.)

I might also try to see if we can edit the tests slightly to require
the recursion case to be exercised.

One other question on testing: do you think the "calculated array"
tests are good enough by themselves (i.e., remove the ones with array
constants of > 100 values)? I dislike that it's not as obvious what's
going on, but given that the code shouldn't care about array size
anyway...so if there's an inclination to fewer tests that's the first
place I'd look at cutting them.

James Coleman

#30

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#29)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

One other question on testing: do you think the "calculated array"
tests are good enough by themselves (i.e., remove the ones with array
constants of > 100 values)? I dislike that it's not as obvious what's
going on, but given that the code shouldn't care about array size
anyway...so if there's an inclination to fewer tests that's the first
place I'd look at cutting them.

I don't have a strong opinion about that at this point. It might be
clearer once the patch is finished; for now, there's no harm in erring
towards the more-tests direction.

regards, tom lane

#31

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: James Coleman (#29)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Wed, Jan 16, 2019 at 8:49 AM James Coleman <jtc331@gmail.com> wrote:

On Tue, Jan 15, 2019 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, as I said upthread, it seems like we need to think a bit more
carefully about what it is that clause_is_strict_for is testing ---
and if that ends up finding that some other name is more apposite,
I'd not have any hesitation about renaming it. But we're really
missing a bet if the ScalarArrayOp-specific check isn't inside that
recursive check of an "atomic" expression's properties. The
routines above there only recurse through things that act like
AND or OR, but clause_is_strict_for is capable of descending
through other stuff as long as it's strict in some sense. What
we need to be clear about here is exactly what that sense is.

All right, I'll look over all of the other callers and see what makes
sense as far as naming (or perhaps consider a new parallel function;
unsure at this point.)

I investigated all of the callers of clause_is_strict_for and
confirmed they fall into two buckets:
- Recursive case inside the function itself.
- Callers proving a variant of IS [NOT] NULL.

Given all cases appear to be safe to extend to scalar array ops, I've
tentatively renamed the function clause_proved_for_null_test and moved
the code back into that function rather than be in the caller. This
also guarantees that beyond proving implication of IS NOT NULL and
refutation of IS NULL we also get proof of weak refutation of strict
predicates (since IS NOT NULL refutes them and we strongly imply IS
NOT NULL); I've added tests for this case.

I added many more comments explaining the proofs here, but it boils
down to the fact that non-empty array ops are really just a strict
clause, and the empty array case isn't strict, but when not returning
NULL returns false, and therefore we can still make the same proofs.

Since the empty array case is really the only interesting one, I
brought back the empty array tests, but modified them to use
calculated/non-constant arrays so that they actually hit the new code
paths. I also verified that the proofs they make are a subset of the
ones we get from existing code for constant empty arrays (it makes
sense we'd be able to prove less about possibly empty arrays than
known empty arrays.)

I might also try to see if we can edit the tests slightly to require
the recursion case to be exercised.

I didn't tackle this, but if you think it would add real value, then I
can look into it further.

I also updated the commit message to better match the more extended
implications of this change over just the IS NOT NULL case I was
originally pursuing.

One other thing: for known non-empty arrays we could further extend
this to prove that an IS NULL clause weakly implies the ScalarArrayOp.
I'm not sure this is worth is it; if someone things otherwise let me
know.

James Coleman

Attachments:

saop_is_not_null-v7.patchapplication/octet-stream; name=saop_is_not_null-v7.patchDownload

commit eabf42512fdb66bbe56b0e51aefd3682438f5e04
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Extend IS [NOT] NULL predicate proofs for arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). Unfortunately this means that we miss opportunities to prove
    with both large and non-constant array expressions.
    
    But it turns out that all IS [NOT] NULL proofs we already derive for
    strict clauses are valid for ScalarArrayOpExpr's with strict operators
    despite the fact that array ops are not guaranteed to be strict (empty
    arrays can result in false when the LHS is null).
    
    One of the main benefits of this change is that the planner may now use
    partial indexes of the form "WHERE foo IS NOT NULL" when the query's
    WHERE clause involves a scalar array op like "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..cc352f4ac8 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -98,7 +98,7 @@ static bool predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 								   bool weak);
 static Node *extract_not_arg(Node *clause);
 static Node *extract_strong_not_arg(Node *clause);
-static bool clause_is_strict_for(Node *clause, Node *subexpr);
+static bool clause_proved_for_null_test(Node *clause, Node *subexpr);
 static bool operator_predicate_proof(Expr *predicate, Node *clause,
 						 bool refute_it, bool weak);
 static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1103,6 +1103,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
  * (Again, this is a safe conclusion because "foo" must be immutable.)
  * This doesn't work for weak implication, though.
  *
+ * Similarly predicates of the form "foo IS NOT NULL" are strongly implied by
+ * the truthfulness of ScalarArrayOpExpr's since with non-empty arrays they are
+ * strict and empty arrays results in false which won't prove anything anyway.
+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
+ *
  * Finally, if both clauses are binary operator expressions, we may be able
  * to prove something using the system's knowledge about operators; those
  * proof rules are encapsulated in operator_predicate_proof().
@@ -1130,7 +1136,7 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			!ntest->argisrow)
 		{
 			/* strictness of clause for foo implies foo IS NOT NULL */
-			if (clause_is_strict_for(clause, (Node *) ntest->arg))
+			if (clause_proved_for_null_test(clause, (Node *) ntest->arg))
 				return true;
 		}
 		return false;			/* we can't succeed below... */
@@ -1157,10 +1163,17 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
  * implication case), or is "foo IS NOT NULL".  That works for either strong
  * or weak refutation.
  *
+ * Similarly, predicates of the form "foo IS NULL" are refuted by the
+ * truthfulness of ScalarArrayOpExpr's. Unlike implication, this also holds
+ * for weak refutation since empty arrays result in false and thus won't prove
+ * anything anyway.
+ *
  * A clause "foo IS NULL" refutes a predicate "foo IS NOT NULL" in all cases.
  * If we are considering weak refutation, it also refutes a predicate that
  * is strict for "foo", since then the predicate must yield NULL (and since
- * "foo" appears in the predicate, it's known immutable).
+ * "foo" appears in the predicate, it's known immutable). Because
+ * ScalarArrayOpExpr's strongly imply "foo IS NOT NULL", they also weakly refute
+ * strict predicates.
  *
  * (The main motivation for covering these IS [NOT] NULL cases is to support
  * using IS NULL/IS NOT NULL as partition-defining constraints.)
@@ -1193,7 +1206,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 			return false;
 
 		/* strictness of clause for foo refutes foo IS NULL */
-		if (clause_is_strict_for(clause, (Node *) isnullarg))
+		if (clause_proved_for_null_test(clause, (Node *) isnullarg))
 			return true;
 
 		/* foo IS NOT NULL refutes foo IS NULL */
@@ -1225,7 +1238,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 
 		/* foo IS NULL weakly refutes any predicate that is strict for foo */
 		if (weak &&
-			clause_is_strict_for((Node *) predicate, (Node *) isnullarg))
+			clause_proved_for_null_test((Node *) predicate, (Node *) isnullarg))
 			return true;
 
 		return false;			/* we can't succeed below... */
@@ -1292,7 +1305,9 @@ extract_strong_not_arg(Node *clause)
 
 
 /*
- * Can we prove that "clause" returns NULL if "subexpr" does?
+ * Most of this method proves that a clause is strict for a given expression.
+ * That is, it answers the question: Can we prove that "clause" returns NULL if
+ * "subexpr" does?
  *
  * The base case is that clause and subexpr are equal().  (We assume that
  * the caller knows at least one of the input expressions is immutable,
@@ -1300,9 +1315,17 @@ extract_strong_not_arg(Node *clause)
  *
  * We can also report success if the subexpr appears as a subexpression
  * of "clause" in a place where it'd force nullness of the overall result.
+ *
+ * ScalarArrayOpExpr's are a special case. With the exception of empty arrays
+ * we can treat them as any other operator expression and verify strictness of
+ * the operator. However an empty array results in false rather than NULL when
+ * the argument is NULL, so these ops aren't actually strict. But since our
+ * predicate proofs of implication and refutation both only matter when the
+ * expression is true, we are able to prove claims about "IS [NOT] NULL" clauses
+ * anyway.
  */
 static bool
-clause_is_strict_for(Node *clause, Node *subexpr)
+clause_proved_for_null_test(Node *clause, Node *subexpr)
 {
 	ListCell   *lc;
 
@@ -1335,7 +1358,7 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((OpExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proved_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
@@ -1345,12 +1368,26 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((FuncExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proved_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
 	}
 
+	/*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays having at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * large arrays separately (this is also useful for non-constant array
+	 * expressions).
+	 */
+	if (clause && IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+				clause_proved_for_null_test((Node *) linitial(saop->args), subexpr))
+			return true;
+	}
+
 	return false;
 }
 
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..6f77ac6614 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,393 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array exppresions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..9a7cd8ee3e 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,204 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array exppresions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);

#32

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: James Coleman (#31)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Hello, I gave this patch a very quick scan. I didn't check the actual
logic behind it.

This comment seems wrong:

+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.

"non-falsity does not imply non-falsity"? I suppose one of those
negations should be different ...

I think the name clause_proved_for_null_test() is a bit weird, being in
the past tense. I'd maybe change "proved" to "proves".

s/exppresions/expresions/ in the test files.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Alvaro Herrera (#32)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Tue, Jan 22, 2019 at 4:26 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hello, I gave this patch a very quick scan. I didn't check the actual
logic behind it.

This comment seems wrong:
+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
"non-falsity does not imply non-falsity"? I suppose one of those
negations should be different ...

Earlier in the file weak implication (comments above
predicate_implied_by) is defined as "non-falsity of A implies
non-falsity of B". In this example we have NULL for A (non-false) but
false for B, so that definition doesn't hold. So I think the comment
is accurate, but I can reword if you have an idea of what you'd like
to see (I've tweaked a bit in the attached patch to start).

I think the name clause_proved_for_null_test() is a bit weird, being in
the past tense. I'd maybe change "proved" to "proves".

Changed.

s/exppresions/expresions/ in the test files.

Fixed.

James Coleman

Attachments:

saop_is_not_null-v8.patchapplication/octet-stream; name=saop_is_not_null-v8.patchDownload

commit d899833f8d329610a237d2269ac4580b27a709d2
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Extend IS [NOT] NULL predicate proofs for arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). Unfortunately this means that we miss opportunities to prove
    with both large and non-constant array expressions.
    
    But it turns out that all IS [NOT] NULL proofs we already derive for
    strict clauses are valid for ScalarArrayOpExpr's with strict operators
    despite the fact that array ops are not guaranteed to be strict (empty
    arrays can result in false when the LHS is null).
    
    One of the main benefits of this change is that the planner may now use
    partial indexes of the form "WHERE foo IS NOT NULL" when the query's
    WHERE clause involves a scalar array op like "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..fab90d97c4 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -98,7 +98,7 @@ static bool predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 								   bool weak);
 static Node *extract_not_arg(Node *clause);
 static Node *extract_strong_not_arg(Node *clause);
-static bool clause_is_strict_for(Node *clause, Node *subexpr);
+static bool clause_proves_for_null_test(Node *clause, Node *subexpr);
 static bool operator_predicate_proof(Expr *predicate, Node *clause,
 						 bool refute_it, bool weak);
 static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1103,6 +1103,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
  * (Again, this is a safe conclusion because "foo" must be immutable.)
  * This doesn't work for weak implication, though.
  *
+ * Similarly predicates of the form "foo IS NOT NULL" are strongly implied by
+ * the truthfulness of ScalarArrayOpExpr's since with non-empty arrays they are
+ * strict and empty arrays results in false which won't prove anything anyway.
+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
+ *
  * Finally, if both clauses are binary operator expressions, we may be able
  * to prove something using the system's knowledge about operators; those
  * proof rules are encapsulated in operator_predicate_proof().
@@ -1130,7 +1136,7 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			!ntest->argisrow)
 		{
 			/* strictness of clause for foo implies foo IS NOT NULL */
-			if (clause_is_strict_for(clause, (Node *) ntest->arg))
+			if (clause_proves_for_null_test(clause, (Node *) ntest->arg))
 				return true;
 		}
 		return false;			/* we can't succeed below... */
@@ -1157,10 +1163,17 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
  * implication case), or is "foo IS NOT NULL".  That works for either strong
  * or weak refutation.
  *
+ * Similarly, predicates of the form "foo IS NULL" are refuted by the
+ * truthfulness of ScalarArrayOpExpr's. Unlike implication, this also holds
+ * for weak refutation since empty arrays result in false and thus won't prove
+ * anything anyway.
+ *
  * A clause "foo IS NULL" refutes a predicate "foo IS NOT NULL" in all cases.
  * If we are considering weak refutation, it also refutes a predicate that
  * is strict for "foo", since then the predicate must yield NULL (and since
- * "foo" appears in the predicate, it's known immutable).
+ * "foo" appears in the predicate, it's known immutable). Because
+ * ScalarArrayOpExpr's strongly imply "foo IS NOT NULL", they also weakly refute
+ * strict predicates.
  *
  * (The main motivation for covering these IS [NOT] NULL cases is to support
  * using IS NULL/IS NOT NULL as partition-defining constraints.)
@@ -1193,7 +1206,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 			return false;
 
 		/* strictness of clause for foo refutes foo IS NULL */
-		if (clause_is_strict_for(clause, (Node *) isnullarg))
+		if (clause_proves_for_null_test(clause, (Node *) isnullarg))
 			return true;
 
 		/* foo IS NOT NULL refutes foo IS NULL */
@@ -1225,7 +1238,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 
 		/* foo IS NULL weakly refutes any predicate that is strict for foo */
 		if (weak &&
-			clause_is_strict_for((Node *) predicate, (Node *) isnullarg))
+			clause_proves_for_null_test((Node *) predicate, (Node *) isnullarg))
 			return true;
 
 		return false;			/* we can't succeed below... */
@@ -1292,7 +1305,9 @@ extract_strong_not_arg(Node *clause)
 
 
 /*
- * Can we prove that "clause" returns NULL if "subexpr" does?
+ * Most of this method proves that a clause is strict for a given expression.
+ * That is, it answers the question: Can we prove that "clause" returns NULL if
+ * "subexpr" does?
  *
  * The base case is that clause and subexpr are equal().  (We assume that
  * the caller knows at least one of the input expressions is immutable,
@@ -1300,9 +1315,17 @@ extract_strong_not_arg(Node *clause)
  *
  * We can also report success if the subexpr appears as a subexpression
  * of "clause" in a place where it'd force nullness of the overall result.
+ *
+ * ScalarArrayOpExpr's are a special case. With the exception of empty arrays
+ * we can treat them as any other operator expression and verify strictness of
+ * the operator. However an empty array results in false rather than NULL when
+ * the argument is NULL, so these ops aren't actually strict. But since our
+ * predicate proofs of implication and refutation both only matter when the
+ * expression is true, we are able to prove claims about "IS [NOT] NULL" clauses
+ * anyway.
  */
 static bool
-clause_is_strict_for(Node *clause, Node *subexpr)
+clause_proves_for_null_test(Node *clause, Node *subexpr)
 {
 	ListCell   *lc;
 
@@ -1335,7 +1358,7 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((OpExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proves_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
@@ -1345,12 +1368,26 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((FuncExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proves_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
 	}
 
+	/*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays having at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * large arrays separately (this is also useful for non-constant array
+	 * expressions).
+	 */
+	if (clause && IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+				clause_proves_for_null_test((Node *) linitial(saop->args), subexpr))
+			return true;
+	}
+
 	return false;
 }
 
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..046bb3942c 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,393 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..10198b27d7 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,204 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);

#34

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: James Coleman (#33)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

This comment seems wrong:
+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
"non-falsity does not imply non-falsity"? I suppose one of those
negations should be different ...
Earlier in the file weak implication (comments above
predicate_implied_by) is defined as "non-falsity of A implies
non-falsity of B". In this example we have NULL for A (non-false) but
false for B, so that definition doesn't hold. So I think the comment
is accurate, but I can reword if you have an idea of what you'd like
to see (I've tweaked a bit in the attached patch to start).

I forgot to update in v8 so attaching v9.

James Coleman

Attachments:

saop_is_not_null-v9.patchapplication/octet-stream; name=saop_is_not_null-v9.patchDownload

commit d899833f8d329610a237d2269ac4580b27a709d2
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Extend IS [NOT] NULL predicate proofs for arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). Unfortunately this means that we miss opportunities to prove
    with both large and non-constant array expressions.
    
    But it turns out that all IS [NOT] NULL proofs we already derive for
    strict clauses are valid for ScalarArrayOpExpr's with strict operators
    despite the fact that array ops are not guaranteed to be strict (empty
    arrays can result in false when the LHS is null).
    
    One of the main benefits of this change is that the planner may now use
    partial indexes of the form "WHERE foo IS NOT NULL" when the query's
    WHERE clause involves a scalar array op like "foo IN (1,2,...101)".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3d5ef6922c..fab90d97c4 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -98,7 +98,7 @@ static bool predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 								   bool weak);
 static Node *extract_not_arg(Node *clause);
 static Node *extract_strong_not_arg(Node *clause);
-static bool clause_is_strict_for(Node *clause, Node *subexpr);
+static bool clause_proves_for_null_test(Node *clause, Node *subexpr);
 static bool operator_predicate_proof(Expr *predicate, Node *clause,
 						 bool refute_it, bool weak);
 static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
@@ -815,7 +815,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1103,6 +1103,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
  * (Again, this is a safe conclusion because "foo" must be immutable.)
  * This doesn't work for weak implication, though.
  *
+ * Similarly predicates of the form "foo IS NOT NULL" are strongly implied by
+ * the truthfulness of ScalarArrayOpExpr's since with non-empty arrays they are
+ * strict and empty arrays results in false which won't prove anything anyway.
+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
+ *
  * Finally, if both clauses are binary operator expressions, we may be able
  * to prove something using the system's knowledge about operators; those
  * proof rules are encapsulated in operator_predicate_proof().
@@ -1130,7 +1136,7 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			!ntest->argisrow)
 		{
 			/* strictness of clause for foo implies foo IS NOT NULL */
-			if (clause_is_strict_for(clause, (Node *) ntest->arg))
+			if (clause_proves_for_null_test(clause, (Node *) ntest->arg))
 				return true;
 		}
 		return false;			/* we can't succeed below... */
@@ -1157,10 +1163,17 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
  * implication case), or is "foo IS NOT NULL".  That works for either strong
  * or weak refutation.
  *
+ * Similarly, predicates of the form "foo IS NULL" are refuted by the
+ * truthfulness of ScalarArrayOpExpr's. Unlike implication, this also holds
+ * for weak refutation since empty arrays result in false and thus won't prove
+ * anything anyway.
+ *
  * A clause "foo IS NULL" refutes a predicate "foo IS NOT NULL" in all cases.
  * If we are considering weak refutation, it also refutes a predicate that
  * is strict for "foo", since then the predicate must yield NULL (and since
- * "foo" appears in the predicate, it's known immutable).
+ * "foo" appears in the predicate, it's known immutable). Because
+ * ScalarArrayOpExpr's strongly imply "foo IS NOT NULL", they also weakly refute
+ * strict predicates.
  *
  * (The main motivation for covering these IS [NOT] NULL cases is to support
  * using IS NULL/IS NOT NULL as partition-defining constraints.)
@@ -1193,7 +1206,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 			return false;
 
 		/* strictness of clause for foo refutes foo IS NULL */
-		if (clause_is_strict_for(clause, (Node *) isnullarg))
+		if (clause_proves_for_null_test(clause, (Node *) isnullarg))
 			return true;
 
 		/* foo IS NOT NULL refutes foo IS NULL */
@@ -1225,7 +1238,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 
 		/* foo IS NULL weakly refutes any predicate that is strict for foo */
 		if (weak &&
-			clause_is_strict_for((Node *) predicate, (Node *) isnullarg))
+			clause_proves_for_null_test((Node *) predicate, (Node *) isnullarg))
 			return true;
 
 		return false;			/* we can't succeed below... */
@@ -1292,7 +1305,9 @@ extract_strong_not_arg(Node *clause)
 
 
 /*
- * Can we prove that "clause" returns NULL if "subexpr" does?
+ * Most of this method proves that a clause is strict for a given expression.
+ * That is, it answers the question: Can we prove that "clause" returns NULL if
+ * "subexpr" does?
  *
  * The base case is that clause and subexpr are equal().  (We assume that
  * the caller knows at least one of the input expressions is immutable,
@@ -1300,9 +1315,17 @@ extract_strong_not_arg(Node *clause)
  *
  * We can also report success if the subexpr appears as a subexpression
  * of "clause" in a place where it'd force nullness of the overall result.
+ *
+ * ScalarArrayOpExpr's are a special case. With the exception of empty arrays
+ * we can treat them as any other operator expression and verify strictness of
+ * the operator. However an empty array results in false rather than NULL when
+ * the argument is NULL, so these ops aren't actually strict. But since our
+ * predicate proofs of implication and refutation both only matter when the
+ * expression is true, we are able to prove claims about "IS [NOT] NULL" clauses
+ * anyway.
  */
 static bool
-clause_is_strict_for(Node *clause, Node *subexpr)
+clause_proves_for_null_test(Node *clause, Node *subexpr)
 {
 	ListCell   *lc;
 
@@ -1335,7 +1358,7 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((OpExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proves_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
@@ -1345,12 +1368,26 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((FuncExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_proves_for_null_test((Node *) lfirst(lc), subexpr))
 				return true;
 		}
 		return false;
 	}
 
+	/*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays having at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * large arrays separately (this is also useful for non-constant array
+	 * expressions).
+	 */
+	if (clause && IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+				clause_proves_for_null_test((Node *) linitial(saop->args), subexpr))
+			return true;
+	}
+
 	return false;
 }
 
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..046bb3942c 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -837,3 +837,393 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..10198b27d7 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -325,3 +325,204 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next few tests, We want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is not null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select y is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select (x + x) is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x is null, x = any((select vals from a)::int[])
+from integers
+$$);
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (
+  select array_agg(i) || null from generate_series(1, 101) t(i)
+)
+select x = any((select vals from a)::int[]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+with a(vals) as (select array[]::int[])
+select x = any((select vals from a)::int[]), x is null
+from integers
+$$);

#35

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#34)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

I forgot to update in v8 so attaching v9.

So ... this is actively wrong.

This case shows that you can't ignore the empty-array possibility
for a ScalarArrayOpExpr with useOr = false, because
"SELECT null::int = all(array[]::int[])" yields TRUE:

This case shows that you can't ignore the distinction between null
and false results once you've descended through strict function(s):

(In this usage, the strictf() function from the test_predtest.sql
will simply return the NOT of its second input; so it gives different
answers depending on whether the SAOP returned false or null.)

I think you could probably repair the second problem by adding
an additional argument to clause_is_strict_for() indicating whether
it has to prove the clause to be NULL or merely non-TRUE; when
recursing you'd always have to ask for the former.

The first problem could be resolved by insisting on being able to
prove the array non-empty when !useOr, but I'm not sure if it's
really worth the trouble, as opposed to just not trying to prove
anything for !useOr cases. I'm not sure that "x op ALL(ARRAY)"
occurs often enough in the wild to justify expending code on.

The other case where being able to prove the array nonempty might
be worth something is when you've recursed and hence need to be
able to prove the SAOP's result to be NULL not just not-TRUE.
Again though, I don't know how often that occurs in practice,
so even the combination of those two motivations might not be
worth having code to check the array argument.

It'd also be worth spending some brain cells on what happens
when the SAOP's array argument is null overall, which causes
its result to be null (not false). Maybe the logic goes
through without needing any explicit test for that case
(and indeed I couldn't break it in testing that), but it'd
likely be worth a comment.

I don't especially care for the proposed function name
"clause_proves_for_null_test". The only thing that brings to my
mind is the question "proves WHAT, exactly?" --- and as these
examples show, being crystal clear on what it proves is essential.

regards, tom lane

#36

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Tom Lane (#35)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

BTW, I just pushed a fix (e04a3905e) that adds a little more code
to clause_is_strict_for(). This shouldn't cause more than minor
rebasing pain for you, I hope.

regards, tom lane

#37

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#35)

1 attachment(s)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Mon, Feb 18, 2019 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So ... this is actively wrong.

This case shows that you can't ignore the empty-array possibility
for a ScalarArrayOpExpr with useOr = false, because
"SELECT null::int = all(array[]::int[])" yields TRUE:

contrib_regression=# select * from test_predtest($$
select x is not null, x = all(array(select i from generate_series(1,0) i))
from integers
$$);
WARNING: strong_implied_by result is incorrect
-[ RECORD 1 ]-----+--
strong_implied_by | t
weak_implied_by | f
strong_refuted_by | f
weak_refuted_by | f
s_i_holds | f
w_i_holds | f
s_r_holds | f
w_r_holds | f

This case shows that you can't ignore the distinction between null
and false results once you've descended through strict function(s):

contrib_regression=# select * from test_predtest($$
select x is not null, strictf(true, x = any(array(select i from generate_series(1,0) i)))
from integers
$$);
WARNING: strong_implied_by result is incorrect
-[ RECORD 1 ]-----+--
strong_implied_by | t
weak_implied_by | f
strong_refuted_by | f
weak_refuted_by | f
s_i_holds | f
w_i_holds | f
s_r_holds | f
w_r_holds | f

(In this usage, the strictf() function from the test_predtest.sql
will simply return the NOT of its second input; so it gives different
answers depending on whether the SAOP returned false or null.)

I think you could probably repair the second problem by adding
an additional argument to clause_is_strict_for() indicating whether
it has to prove the clause to be NULL or merely non-TRUE; when
recursing you'd always have to ask for the former.

I've implemented this fix as suggested. The added argument I've
initially called "allow_false"; I don't love the name, but it is
directly what it does. I'd appreciate other suggestions (if you prefer
it change).

The first problem could be resolved by insisting on being able to
prove the array non-empty when !useOr, but I'm not sure if it's
really worth the trouble, as opposed to just not trying to prove
anything for !useOr cases. I'm not sure that "x op ALL(ARRAY)"
occurs often enough in the wild to justify expending code on.

The other case where being able to prove the array nonempty might
be worth something is when you've recursed and hence need to be
able to prove the SAOP's result to be NULL not just not-TRUE.
Again though, I don't know how often that occurs in practice,
so even the combination of those two motivations might not be
worth having code to check the array argument.

I've implemented this also; the thing that puts it over the edge for
me on the question of "is it worth it in the wild" is cases like "x
!op ALL(ARRAY)", since it seems plausible to me that that's (unlike
the non-negated case) a reasonably common operation. At the very least
it seems likely to be a more logically interesting operation, which
I'm taking as a proxy for common, I suppose.

It'd also be worth spending some brain cells on what happens
when the SAOP's array argument is null overall, which causes
its result to be null (not false). Maybe the logic goes
through without needing any explicit test for that case
(and indeed I couldn't break it in testing that), but it'd
likely be worth a comment.

Since all SAOPs (both ALL and ANY) return NULL when the array is NULL
regardless of the LHS argument, then this case qualifies as strict.
I've included this in the above fix since both NULL constant arrays
and non-empty arrays can both be proven strict with strict operators.

Ideally I think this case would be handled elsewhere in the optimizer
by directly replacing the saop and a constant NULL array with NULL,
but this isn't the patch to do that, and at any rate I'd have to do a
bit more investigation to understand how to do such (if it's easily
possible).

I don't especially care for the proposed function name
"clause_proves_for_null_test". The only thing that brings to my
mind is the question "proves WHAT, exactly?" --- and as these
examples show, being crystal clear on what it proves is essential.

Given the new argument, I've reverted the name change.

I also updated the tests with a new helper function "opaque_array" to
make it easy to test cases where the planner can't inspect the array.
That way we don't need to rely on CTEs as an optimization fence --
thus improving both maintainability and readability. I also noticed
there were also some tests where the array was already opaque yet I
was still using CTEs as an optimization fence; I've cleaned those up.

I've updated the comments significantly to reflect the above changes.

I've attached an updated (and rebased) patch.

James Coleman

Attachments:

saop_is_not_null-v10.patchtext/x-patch; charset=US-ASCII; name=saop_is_not_null-v10.patchDownload

commit 2cf8f951ca6524d4eb4c927d9903736a0751c89c
Author: jcoleman <jtc331@gmail.com>
Date:   Wed Nov 14 16:49:52 2018 +0000

    Extend IS [NOT] NULL predicate proofs for arrays
    
    For the purposes of predicate proof testing we limit ScalarArrayOpExpr
    decomposition to arrays with <= MAX_SAOP_ARRAY_SIZE items (currently
    100 items). Unfortunately this means that we miss opportunities for
    proofs with both large and non-constant array expressions.
    
    It's trivially shown that a scalar array op with a known non-empty array
    is strict so long its operator is strict; similarly ops with constant
    NULL arrays always result in NULL and are therefore strict.
    
    If we don't know an array is NULL or non-empty, things are a bit more
    complicated. For ANY ops an empty array results in FALSE, so we can
    continue to derive  proofs since in the base case we only care about
    expressions that result in TRUE for proof correctness.
    
    For ALL ops we can't make any assumptions: if the array isn't known to
    be empty or NULL we have to pass on proving [NOT] NULL.
    
    One other exception: once we've recursed through a strict function we
    have to require strictness (rather than our relaxed "can be FALSE for
    ANY ops" definition in the base case) since a strict function may return
    TRUE at arbitrary points.
    
    One of the main benefits of this change is that the planner may now use
    partial indexes of the form "WHERE foo IS NOT NULL" when the query's
    WHERE clause involves a scalar array op like "foo IN (1,2,...101)" or
    "foo != ALL(ARRAY[1,2,...101])".

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index 3c9f245e4d..214a85f701 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -99,7 +99,7 @@ static bool predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 								   bool weak);
 static Node *extract_not_arg(Node *clause);
 static Node *extract_strong_not_arg(Node *clause);
-static bool clause_is_strict_for(Node *clause, Node *subexpr);
+static bool clause_is_strict_for(Node *clause, Node *subexpr, bool allow_false);
 static bool operator_predicate_proof(Expr *predicate, Node *clause,
 						 bool refute_it, bool weak);
 static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
@@ -816,7 +816,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
  * This function also implements enforcement of MAX_SAOP_ARRAY_SIZE: if a
  * ScalarArrayOpExpr's array has too many elements, we just classify it as an
  * atom.  (This will result in its being passed as-is to the simple_clause
- * functions, which will fail to prove anything about it.)	Note that we
+ * functions, many of which will fail to prove anything about it.)  Note that we
  * cannot just stop after considering MAX_SAOP_ARRAY_SIZE elements; in general
  * that would result in wrong proofs, rather than failing to prove anything.
  */
@@ -1104,6 +1104,14 @@ arrayexpr_cleanup_fn(PredIterInfo info)
  * (Again, this is a safe conclusion because "foo" must be immutable.)
  * This doesn't work for weak implication, though.
  *
+ * Similarly predicates of the form "foo IS NOT NULL" are strongly implied by
+ * the truthfulness of OR'd ScalarArrayOpExpr's since with non-empty arrays they
+ * are strict and empty arrays results in false which won't prove anything anyway.
+ * They are also implied by AND'd ScalarArrayOpExpr's as long as the array is
+ * known to be NULL or non-empty. However weak implication fails: e.g.,
+ * "NULL IS NOT NULL" is false, but "NULL = ANY(ARRAY[NULL])" is NULL, so
+ * non-falsity does not imply non-falsity.
+ *
  * Finally, if both clauses are binary operator expressions, we may be able
  * to prove something using the system's knowledge about operators; those
  * proof rules are encapsulated in operator_predicate_proof().
@@ -1131,7 +1139,7 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 			!ntest->argisrow)
 		{
 			/* strictness of clause for foo implies foo IS NOT NULL */
-			if (clause_is_strict_for(clause, (Node *) ntest->arg))
+			if (clause_is_strict_for(clause, (Node *) ntest->arg, true))
 				return true;
 		}
 		return false;			/* we can't succeed below... */
@@ -1158,10 +1166,16 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
  * implication case), or is "foo IS NOT NULL".  That works for either strong
  * or weak refutation.
  *
+ * Similarly, predicates of the form "foo IS NULL" are refuted by the
+ * truthfulness of OR'd and AND'd (with known NULL or non-empty arrays)
+ * ScalarArrayOpExpr's. Unlike implication, this also holds for weak refutation
+ * since empty arrays result in false and thus won't prove anything anyway.
+ *
  * A clause "foo IS NULL" refutes a predicate "foo IS NOT NULL" in all cases.
  * If we are considering weak refutation, it also refutes a predicate that
  * is strict for "foo", since then the predicate must yield NULL (and since
- * "foo" appears in the predicate, it's known immutable).
+ * "foo" appears in the predicate, it's known immutable). ScalarArrayOpExpr's
+ * that strongly imply "foo IS NOT NULL" also weakly refute strict predicates.
  *
  * (The main motivation for covering these IS [NOT] NULL cases is to support
  * using IS NULL/IS NOT NULL as partition-defining constraints.)
@@ -1194,7 +1208,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 			return false;
 
 		/* strictness of clause for foo refutes foo IS NULL */
-		if (clause_is_strict_for(clause, (Node *) isnullarg))
+		if (clause_is_strict_for(clause, (Node *) isnullarg, true))
 			return true;
 
 		/* foo IS NOT NULL refutes foo IS NULL */
@@ -1226,7 +1240,7 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
 
 		/* foo IS NULL weakly refutes any predicate that is strict for foo */
 		if (weak &&
-			clause_is_strict_for((Node *) predicate, (Node *) isnullarg))
+			clause_is_strict_for((Node *) predicate, (Node *) isnullarg, true))
 			return true;
 
 		return false;			/* we can't succeed below... */
@@ -1293,7 +1307,9 @@ extract_strong_not_arg(Node *clause)
 
 
 /*
- * Can we prove that "clause" returns NULL if "subexpr" does?
+ * Most of this method proves that a clause is strict for a given expression.
+ * That is, it answers the question: Can we prove that "clause" returns NULL if
+ * "subexpr" does?
  *
  * The base case is that clause and subexpr are equal().  (We assume that
  * the caller knows at least one of the input expressions is immutable,
@@ -1301,9 +1317,32 @@ extract_strong_not_arg(Node *clause)
  *
  * We can also report success if the subexpr appears as a subexpression
  * of "clause" in a place where it'd force nullness of the overall result.
+ *
+ * ScalarArrayOpExpr's are a special case. With the exception of empty arrays
+ * we can treat them as any other operator expression and verify strictness of
+ * the operator.
+ *
+ * However an empty array results in either false (for ANY) or true (for ALL).
+ * Because the result is a real boolean (instead of NULL) even when when the
+ * argument is NULL, scalar array ops aren't actually strict.
+ *
+ * We can safely ignore this in the base case for ANY since the predicate proofs
+ * of implication and refutation both only matter when the expression is true;
+ * we are able to prove claims about "IS [NOT] NULL" clauses anyway. The same
+ * holds true if the array itself is NULL since the result is always NULL.
+ *
+ * In constrast the ALL base case requires us to additionally prove the array
+ * is not empty since a TRUE result means the implication or refutation must
+ * hold. We can also prove for known NULL arrays since the result is always
+ * NULL.
+ *
+ * In both cases we have to fallback to requiring fully strict behavior (and
+ * thus be able to prove the array is either non-empty or NULL) as soon as we've
+ * recursed through a strict function (since such a function is free to return
+ * TRUE at any point so long as the argument is NOT NULL).
  */
 static bool
-clause_is_strict_for(Node *clause, Node *subexpr)
+clause_is_strict_for(Node *clause, Node *subexpr, bool allow_false)
 {
 	ListCell   *lc;
 
@@ -1336,7 +1375,7 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((OpExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_is_strict_for((Node *) lfirst(lc), subexpr, false))
 				return true;
 		}
 		return false;
@@ -1346,7 +1385,7 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	{
 		foreach(lc, ((FuncExpr *) clause)->args)
 		{
-			if (clause_is_strict_for((Node *) lfirst(lc), subexpr))
+			if (clause_is_strict_for((Node *) lfirst(lc), subexpr, false))
 				return true;
 		}
 		return false;
@@ -1362,16 +1401,66 @@ clause_is_strict_for(Node *clause, Node *subexpr)
 	 */
 	if (IsA(clause, CoerceViaIO))
 		return clause_is_strict_for((Node *) ((CoerceViaIO *) clause)->arg,
-									subexpr);
+									subexpr, false);
 	if (IsA(clause, ArrayCoerceExpr))
 		return clause_is_strict_for((Node *) ((ArrayCoerceExpr *) clause)->arg,
-									subexpr);
+									subexpr, false);
 	if (IsA(clause, ConvertRowtypeExpr))
 		return clause_is_strict_for((Node *) ((ConvertRowtypeExpr *) clause)->arg,
-									subexpr);
+									subexpr, false);
 	if (IsA(clause, CoerceToDomain))
 		return clause_is_strict_for((Node *) ((CoerceToDomain *) clause)->arg,
-									subexpr);
+									subexpr, false);
+
+	/*
+	 * Since we limit decomposing ScalarArrayOpExpr nodes into AND/OR quals
+	 * to arrays having at most MAX_SAOP_ARRAY_SIZE items, we need to handle
+	 * large arrays separately (this is also useful for non-constant array
+	 * expressions).
+	 */
+	if (clause && IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		if (op_strict(saop->opno) &&
+				clause_is_strict_for((Node *) linitial(saop->args), subexpr, false))
+		{
+			if (!allow_false|| !saop->useOr)
+			{
+				Node *arraynode = (Node *) lsecond(saop->args);
+				int nelems = 0;
+
+				if (arraynode && IsA(arraynode, Const))
+				{
+					ArrayType  *arrval;
+					Const *arrayconst = (Const *) arraynode;
+					/*
+					 * Scalar array ops with NULL array constants always result
+					 * in NULL so are strict.
+					 */
+					if (arrayconst->constisnull)
+						return true;
+
+					arrval = DatumGetArrayTypeP(arrayconst->constvalue);
+					nelems = ArrayGetNItems(ARR_NDIM(arrval), ARR_DIMS(arrval));
+				}
+				else if (arraynode && IsA(arraynode, ArrayExpr) &&
+						 !((ArrayExpr *) arraynode)->multidims)
+					nelems = list_length(((ArrayExpr *) arraynode)->elements);
+
+				/*
+				 * Empty AND/OR groups resolve to TRUE or FALSE rather than null,
+				 * so in addition to verifying the operation is otherwise strict,
+				 * we also have to verify we have a non-empty array.
+				 */
+				if (nelems > 0)
+					return true;
+				else
+					return false;
+			}
+			else
+			  return true;
+		}
+	}
 
 	return false;
 }
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out b/src/test/modules/test_predtest/expected/test_predtest.out
index 5574e03204..11e635a30a 100644
--- a/src/test/modules/test_predtest/expected/test_predtest.out
+++ b/src/test/modules/test_predtest/expected/test_predtest.out
@@ -19,6 +19,9 @@ from generate_series(0, 11*11-1) i;
 -- and a simple strict function that's opaque to the optimizer
 create function strictf(bool, bool) returns bool
 language plpgsql as $$begin return $1 and not $2; end$$ strict;
+-- a simple function make arrays opaque to the optimizer
+create function opaque_array(int[]) returns int[]
+language plpgsql as $$begin return $1; end$$ strict;
 -- Basic proof rules for single boolean variables
 select * from test_predtest($$
 select x, x
@@ -837,3 +840,556 @@ w_i_holds         | f
 s_r_holds         | f
 w_r_holds         | f
 
+-- For the next several tests, we want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- For tests using constant arrays the planner can inspect (i.e., not hidden
+-- behind an optimization fence like a CTE) we include NULL in our array
+-- for completeness's sake *unless* testing ALL since that would there it
+-- would force the expression to NULL which makes it generally less useful
+-- as a test case.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is not null, x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select y is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is not null, x = any(opaque_array(array[]::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is not null, x = all(opaque_array(array[]::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is not null, strictf(true, x = any(opaque_array(array[]::int[])))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is not null, strictf(true, x = all(null::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | t
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is null, x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select y is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select (x + x) is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is null, x = any(opaque_array(array[]::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x is null, x = all(opaque_array(array[]::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is null, strictf(true, x = any(opaque_array(array[]::int[])))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x is null, strictf(true, x = all(null::int[]))
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | t
+weak_refuted_by   | t
+s_i_holds         | t
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), y is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), (x + x) is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = any(opaque_array(array[]::int[])), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | f
+s_r_holds         | t
+w_r_holds         | t
+
+select * from test_predtest($$
+select x = all(opaque_array(array[]::int[])), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | f
+s_i_holds         | t
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | f
+
+select * from test_predtest($$
+select x = all(null::int[]), x is null
+from integers
+$$);
+-[ RECORD 1 ]-----+--
+strong_implied_by | f
+weak_implied_by   | f
+strong_refuted_by | f
+weak_refuted_by   | t
+s_i_holds         | f
+w_i_holds         | t
+s_r_holds         | f
+w_r_holds         | t
+
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql b/src/test/modules/test_predtest/sql/test_predtest.sql
index 2734735843..67160f3d83 100644
--- a/src/test/modules/test_predtest/sql/test_predtest.sql
+++ b/src/test/modules/test_predtest/sql/test_predtest.sql
@@ -25,6 +25,10 @@ from generate_series(0, 11*11-1) i;
 create function strictf(bool, bool) returns bool
 language plpgsql as $$begin return $1 and not $2; end$$ strict;
 
+-- a simple function make arrays opaque to the optimizer
+create function opaque_array(int[]) returns int[]
+language plpgsql as $$begin return $1; end$$ strict;
+
 -- Basic proof rules for single boolean variables
 
 select * from test_predtest($$
@@ -325,3 +329,268 @@ select * from test_predtest($$
 select x <= y, x = any(array[1,3,y])
 from integers
 $$);
+
+-- For the next several tests, we want to test an array longer than
+-- MAX_SAOP_ARRAY_SIZE so that we're not relying on predtest turning the
+-- array op into set of AND/OR quals. We also need test empty arrays since
+-- they are the case that deviates from this being merely a check for the
+-- clause being strict.
+--
+-- We also want to include test cases for non-constant array expressions.
+--
+-- Tests employing (x + x) to verify we don't infer implication or
+-- refutation of "strict_func(x) IS [NOT] NULL" from "x IS [NOT] NULL"
+-- will show *_holds=t even when *_by=f because int4pl never returns
+-- NULL unless an operand is NULL, but we can't assume this is true
+-- of all functions.
+--
+-- For tests using constant arrays the planner can inspect (i.e., not hidden
+-- behind an optimization fence like a CTE) we include NULL in our array
+-- for completeness's sake *unless* testing ALL since that would there it
+-- would force the expression to NULL which makes it generally less useful
+-- as a test case.
+--
+-- ScalarArrayOp implies IS NOT NULL
+select * from test_predtest($$
+select x is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select y is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is not null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, x = any(opaque_array(array[]::int[]))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, x = all(opaque_array(array[]::int[]))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, strictf(true, x = any(opaque_array(array[]::int[])))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is not null, strictf(true, x = all(null::int[]))
+from integers
+$$);
+
+-- ScalarArrayOp refutes IS NULL
+select * from test_predtest($$
+select x is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+])
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select y is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select (x + x) is null, x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+)
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, x = any(opaque_array(array[]::int[]))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, x = all(opaque_array(array[]::int[]))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, strictf(true, x = any(opaque_array(array[]::int[])))
+from integers
+$$);
+
+select * from test_predtest($$
+select x is null, strictf(true, x = all(null::int[]))
+from integers
+$$);
+
+-- Strict predicate weakly refutes ScalarArrayOp
+--
+-- We get w_i_holds despite not declaring weak_implied_by, but
+-- a simple case (not tested here) shows why this isn't desired:
+-- "(NULL IS NULL)" is true while "= ANY(ARRAY[])" is false.
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x != all(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
+]), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(array[
+  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
+  29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,
+  54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
+  79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,null
+]), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), y is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(
+  (select array_agg(i) || null from generate_series(1, 101) t(i))::int[]
+), (x + x) is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = any(opaque_array(array[]::int[])), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = all(opaque_array(array[]::int[])), x is null
+from integers
+$$);
+
+select * from test_predtest($$
+select x = all(null::int[]), x is null
+from integers
+$$);

#38

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#37)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v10.patch ]

I went through this again, and this time (after some more rewriting
of the comments) I satisfied myself that the logic is correct.
Hence, pushed.

I stripped down the regression test cases somewhat. Those were
good for development, but they were about doubling the runtime of
test_predtest.sql, which seemed excessive for testing one feature.

I also tweaked it to recognize the case where we can prove the
array, rather than the scalar, to be null. I'm not sure how useful
that is in practice, but it seemed weird that we'd exploit that
only if we can also prove the scalar to be null.

I've implemented this fix as suggested. The added argument I've
initially called "allow_false"; I don't love the name, but it is
directly what it does. I'd appreciate other suggestions (if you prefer
it change).

I was tempted to rename it to "weak", but decided that that might be
overloading that term too much in this module. Left it as-is.

Ideally I think this case would be handled elsewhere in the optimizer
by directly replacing the saop and a constant NULL array with NULL,
but this isn't the patch to do that, and at any rate I'd have to do a
bit more investigation to understand how to do such (if it's easily
possible).

Take a look at the ScalarArrayOp case in eval_const_expressions.
Right now it's only smart about the all-inputs-constant case.
I'm not really convinced it's worth spending cycles on the constant-
null-array case, but that'd be where to do it if we want to do it
in a general way. (I think that what I added to clause_is_strict_for
is far cheaper, because while it's the same test, it's in a place
that we won't hit during most queries.)

I also updated the tests with a new helper function "opaque_array" to
make it easy to test cases where the planner can't inspect the array.

Yeah, that's a win. I used that in most of the tests that I left in
place, but I kept a couple with long arrays just so we'd have code
coverage of the parts of clause_is_strict_for that need to detect
array size.

regards, tom lane

#39

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#38)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

[ saop_is_not_null-v10.patch ]

I went through this again, and this time (after some more rewriting
of the comments) I satisfied myself that the logic is correct.
Hence, pushed.

Thanks!

I also tweaked it to recognize the case where we can prove the
array, rather than the scalar, to be null. I'm not sure how useful
that is in practice, but it seemed weird that we'd exploit that
only if we can also prove the scalar to be null.

Just for my own understanding: I thought the "if
(arrayconst->constisnull)" took care of the array constant being null?
I don't see a check on the scalar node / lhs. I do see you added a
check for the entire clause being null, but I'm not sure I understand
when that would occur (unless it's only in the recursive case?)

Take a look at the ScalarArrayOp case in eval_const_expressions.
Right now it's only smart about the all-inputs-constant case.
I'm not really convinced it's worth spending cycles on the constant-
null-array case, but that'd be where to do it if we want to do it
in a general way. (I think that what I added to clause_is_strict_for
is far cheaper, because while it's the same test, it's in a place
that we won't hit during most queries.)

Thanks for the pointer; I'll take a look if for no other reason than curiosity.

Thanks,
James Coleman

#40

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: James Coleman (#39)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

James Coleman <jtc331@gmail.com> writes:

On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I also tweaked it to recognize the case where we can prove the
array, rather than the scalar, to be null. I'm not sure how useful
that is in practice, but it seemed weird that we'd exploit that
only if we can also prove the scalar to be null.

Just for my own understanding: I thought the "if
(arrayconst->constisnull)" took care of the array constant being null?

Yeah, but only after we've already matched the subexpr to the LHS,
otherwise we'd not be bothering to determine the array's size.

On the other hand, if for some reason we know that the array side
is NULL, we can conclude that the ScalarArrayOp returns NULL
independently of what the LHS is.

I don't see a check on the scalar node / lhs. I do see you added a
check for the entire clause being null, but I'm not sure I understand
when that would occur (unless it's only in the recursive case?)

Yeah, it's to handle the case where we run into a constant NULL
below a strict node. Typically, that doesn't happen because
eval_const_expressions would have const-folded the upper node, but
it's not impossible given all the cases that clause_is_strict_for
can recurse through now. (The coverage bot does show that code being
reached, although perhaps that only occurs for null-below-ScalarArrayOp,
in which case it might be dead if we were to make eval_const_expressions
smarter.)

regards, tom lane

#41

James Coleman

jtc331@gmail.com

almost 7 years ago

In reply to: Tom Lane (#40)

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

On Sat, Mar 2, 2019 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I also tweaked it to recognize the case where we can prove the
array, rather than the scalar, to be null. I'm not sure how useful
that is in practice, but it seemed weird that we'd exploit that
only if we can also prove the scalar to be null.

Just for my own understanding: I thought the "if
(arrayconst->constisnull)" took care of the array constant being null?

Yeah, but only after we've already matched the subexpr to the LHS,
otherwise we'd not be bothering to determine the array's size.

On the other hand, if for some reason we know that the array side
is NULL, we can conclude that the ScalarArrayOp returns NULL
independently of what the LHS is.

Thanks for the detailed explanation.

I don't see a check on the scalar node / lhs. I do see you added a
check for the entire clause being null, but I'm not sure I understand
when that would occur (unless it's only in the recursive case?)

Yeah, it's to handle the case where we run into a constant NULL
below a strict node. Typically, that doesn't happen because
eval_const_expressions would have const-folded the upper node, but
it's not impossible given all the cases that clause_is_strict_for
can recurse through now. (The coverage bot does show that code being
reached, although perhaps that only occurs for null-below-ScalarArrayOp,
in which case it might be dead if we were to make eval_const_expressions
smarter.)

Ah, that makes sense.

James Coleman