Convert NOT IN sublinks to anti-joins when safe
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.
Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1]/messages/by-id/13766.1405037879@sss.pgh.pa.us by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.
The patch primarily targets patterns like:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
... and
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);
This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.
It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.
(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)
Any thoughts?
[1]: /messages/by-id/13766.1405037879@sss.pgh.pa.us
- Richard
Attachments:
v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1077-65
Hi Richard
I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.
Thank you for working on this.Indeed, the benefits are substantial and
highly necessary, as Oracle, SQL Server, and MySQL have all implemented
varying degrees of support.I shall test this path in my spare time.
Thanks ,
On Tue, Feb 3, 2026 at 3:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1] by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.The patch primarily targets patterns like:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
... and
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)Any thoughts?
[1] /messages/by-id/13766.1405037879@sss.pgh.pa.us
- Richard
Hi Richard
That's just brilliant! Here’s my testing process, which also tackles a
commonly criticized aspect of PostgreSQL, effectively addressing its
shortcomings.Moreover, the path test scenarios are thorough, encompassing
cases that involve "NOT IN".Once again, thank you for your hard work.
###########
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values (
generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values (
generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>10000);
QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=20060.65..58729.40 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..17904.72
rows=862371 width=4)
Index Cond: (id > 10000)
(5 rows)
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>10000 and t2.id is not null);
QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=22216.57..60885.32 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..20060.65
rows=862371 width=4)
Index Cond: ((id > 10000) AND (id IS NOT NULL))
(5 rows)
postgres=# alter table join1 alter id set not null;
ALTER TABLE
postgres=# alter table join2 alter id set not null;
ALTER TABLE
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Anti Join (cost=28684.35..72424.36 rows=393902 width=8)
Hash Cond: (t1.id = t2.id)
-> Seq Scan on join1 t1 (cost=0.00..35518.00 rows=1260300 width=8)
-> Hash (cost=17904.72..17904.72 rows=862371 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..17904.72
rows=862371 width=4)
Index Cond: (id > 10000)
(6 rows)
postgres=#
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 where t2.id is not null);
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=27134.57..70874.58 rows=393902 width=8) (actual
time=239.777..602.433 rows=400200.00 loops=1)
Hash Cond: (t1.id = t2.id)
Buffers: shared hit=25304
-> Seq Scan on join1 t1 (cost=0.00..35518.00 rows=1260300 width=8)
(actual time=0.007..85.255 rows=1260300.00 loops=1)
Buffers: shared hit=22915
-> Hash (cost=16108.32..16108.32 rows=882100 width=4) (actual
time=220.949..220.951 rows=882100.00 loops=1)
Buckets: 1048576 Batches: 1 Memory Usage: 39204kB
Buffers: shared hit=2389
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..16108.32
rows=882100 width=4) (actual time=0.008..103.306 rows=882100.00 loops=1)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=2389
Planning:
Buffers: shared hit=12
Planning Time: 0.193 ms
Execution Time: 617.668 ms
(16 rows)
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 );
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Anti Join (cost=27134.57..70874.58 rows=393902 width=8)
Hash Cond: (t1.id = t2.id)
-> Seq Scan on join1 t1 (cost=0.00..35518.00 rows=1260300 width=8)
-> Hash (cost=16108.32..16108.32 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..16108.32
rows=882100 width=4)
(5 rows)
postgres=# alter table join2 alter id drop not null ;
ALTER TABLE
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=18313.57..56982.32 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..16108.32
rows=882100 width=4)
(4 rows)
postgres=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where t2.id is not null);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Anti Join (cost=29339.83..73079.83 rows=393902 width=8)
Hash Cond: (t1.id = t2.id)
-> Seq Scan on join1 t1 (cost=0.00..35518.00 rows=1260300 width=8)
-> Hash (cost=18313.58..18313.58 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..18313.58
rows=882100 width=4)
Index Cond: (id IS NOT NULL)
(6 rows)
postgres=# insert into join2 values (
generate_series(950300,1950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
INSERT 0 1000001
postgres=# analyze join2;
ANALYZE
postgres=#
postgres=# insert into join2 values (
generate_series(1950300,3950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
INSERT 0 2000001
postgres=# analyze join2;
ANALYZE
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 where t2.id is not null);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.86..73003.87 rows=1 width=8) (actual
time=348.632..371.158 rows=200.00 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=99166 read=3398
-> Merge Anti Join (cost=0.86..72003.77 rows=1 width=8) (actual
time=177.574..334.714 rows=50.00 loops=4)
Merge Cond: (t1.id = t2.id)
Buffers: shared hit=99166 read=3398
-> Parallel Index Scan using idx_j1 on join1 t1
(cost=0.43..37380.06 rows=406548 width=8) (actual time=0.090..81.338
rows=315075.00 loops=4)
Index Searches: 1
Buffers: shared hit=85241 read=3398
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.43..80682.41
rows=3882102 width=4) (actual time=0.049..145.187 rows=1281766.50 loops=4)
Index Cond: (id IS NOT NULL)
Heap Fetches: 0
Index Searches: 4
Buffers: shared hit=13925
Planning:
Buffers: shared hit=12
Planning Time: 0.191 ms
Execution Time: 371.191 ms
postgres=# set work_mem ='128MB';
SET
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=80682.42..119351.17 rows=630150 width=8)
(actual time=1477.040..1835.205 rows=200.00 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
Rows Removed by Filter: 1260100
Buffers: shared hit=33502
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.43..70977.16
rows=3882102 width=4) (actual time=0.028..421.350 rows=3882102.00 loops=1)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=10587
Planning Time: 0.083 ms
Execution Time: 1845.041 ms
(11 rows)
postgres=# set work_mem ='8MB';
SET
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=80682.42..119351.17 rows=630150 width=8)
(actual time=1468.254..1825.373 rows=200.00 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
Rows Removed by Filter: 1260100
Buffers: shared hit=33502
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.43..70977.16
rows=3882102 width=4) (actual time=0.025..425.249 rows=3882102.00 loops=1)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=10587
Planning Time: 0.085 ms
Execution Time: 1835.148 ms
(11 rows)
postgres=# set work_mem ='128MB';
SET
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=80682.42..119351.17 rows=630150 width=8)
(actual time=1477.040..1835.205 rows=200.00 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
Rows Removed by Filter: 1260100
Buffers: shared hit=33502
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.43..70977.16
rows=3882102 width=4) (actual time=0.028..421.350 rows=3882102.00 loops=1)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=10587
Planning Time: 0.083 ms
Execution Time: 1845.041 ms
(11 rows)
postgres=# set work_mem ='8MB';
SET
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=80682.42..119351.17 rows=630150 width=8)
(actual time=1468.254..1825.373 rows=200.00 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1)))
Rows Removed by Filter: 1260100
Buffers: shared hit=33502
SubPlan any_1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.43..70977.16
rows=3882102 width=4) (actual time=0.025..425.249 rows=3882102.00 loops=1)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=10587
Planning Time: 0.085 ms
Execution Time: 1835.148 ms
(11 rows)
postgres=# set work_mem ='2MB';
SET
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
^CCancel request sent
ERROR: canceling statement due to user request
postgres=# ^C
postgres=#
postgres=# set work_mem ='2MB';
SET
postgres=# select now();
now
-------------------------------
2026-02-04 11:31:42.957009+08
(1 row)
postgres=# select now();
now
-------------------------------
2026-02-04 11:32:10.811249+08
(1 row)
postgres=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id
NOT IN (SELECT T2.id FROM join2 t2 );
^CCancel request sent
ERROR: canceling statement due to user request
postgres=# select now();
now
-------------------------------
2026-02-04 11:42:29.698854+08
(1 row)
postgres=#
Thanks
On Tue, Feb 3, 2026 at 5:41 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Show quoted text
Hi Richard
I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.Thank you for working on this.Indeed, the benefits are substantial
and highly necessary, as Oracle, SQL Server, and MySQL have all implemented
varying degrees of support.I shall test this path in my spare time.Thanks ,
On Tue, Feb 3, 2026 at 3:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1] by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.The patch primarily targets patterns like:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
... and
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)Any thoughts?
[1] /messages/by-id/13766.1405037879@sss.pgh.pa.us
- Richard
Hi,
On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux@gmail.com>, wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
Thanks for working on this!
I've reviewed the patch and it looks good overall.
I noticed several minor issues in the test case comments:
1. The comment doesn't match the SQL:
+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
+-- by qual clause
+SELECT * FROM not_null_tab
+WHERE id NOT IN (
+ SELECT t1.id
+ FROM null_tab t1
+ INNER JOIN null_tab t2 ON t1.id = t2.id
+ LEFT JOIN null_tab t3 ON TRUE
+);
The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.
2. Another test case that could use a more precise comment:
+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
+-- inner side is defined NOT NULL
+SELECT * FROM not_null_tab t1
+LEFT JOIN not_null_tab t2
+ON t2.id NOT IN (SELECT id FROM not_null_tab);
Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
The current comment is technically correct but might be clearer as:
-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL
3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:
-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
SELECT t2.id
FROM not_null_tab t1
LEFT JOIN not_null_tab t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
);
The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).
If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good.
And I test it:
QUERY PLAN
----------------------------------------------------
Hash Anti Join
Hash Cond: (not_null_tab.id = t2.id)
-> Seq Scan on not_null_tab
-> Hash
-> Merge Join
Merge Cond: (t2.id = t1.id)
-> Sort
Sort Key: t2.id
-> Seq Scan on null_tab t2
Filter: (id IS NOT NULL)
-> Sort
Sort Key: t1.id
-> Seq Scan on null_tab t1
(13 rows)
--
Zhang Mingli
HashData
Zhang Mingli <zmlpostgres@gmail.com> 于2026年2月4日周三 12:50写道:
Hi,
On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux@gmail.com>, wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.Thanks for working on this!
I've reviewed the patch and it looks good overall.I noticed several minor issues in the test case comments:
1. The comment doesn't match the SQL:
+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +);The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.
I guess that it means "t1.id = t2.id". This join clause makes t1.id
forced non-nullable.
--
Thanks,
Tender Wang
On Tue, Feb 3, 2026 at 4:12 PM Richard Guo <guofenglinux@gmail.com> wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
I've noticed a loose end in the v1 patch.
The semantic gap between NOT IN and anti-join actually exists whenever
the operator returns NULL. For NOT IN, if (A op B) returns NULL, then
NOT (NULL) evaluates to NULL (effectively false), and the row is
discarded. In contrast, for an anti-join, if (A op B) returns NULL,
it implies no match was found, and the anti-join logic dictates that
the row should be kept.
To guarantee that (A op B) never returns NULL, the current patch
verifies that both A and B are non-nullable. However, this is not
sufficient. The "op" might be an operator that returns NULL on
non-null inputs.
On the other hand, if "op" does not return NULL on NULL inputs, like
IS DISTINCT FROM, we technically would not even need to require that A
and B are non-nullable.
Is there a convenient way to verify that an operator never returns
NULL on non-null inputs? Would it be sufficient to insist that the
operator belongs to btree opclass (assuming that the strict ordering
requirements of btree imply this safety)?
And, is it worth checking if an operator never returns NULL even on
NULL inputs? If we can identify such operators, we should be able to
remove the requirement that both sides of NOT IN must be non-nullable.
Is there a convenient way to check for such operators?
- Richard
Hi!
On 04.02.2026 10:47, Richard Guo wrote:
On Tue, Feb 3, 2026 at 4:12 PM Richard Guo <guofenglinux@gmail.com> wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
I used to play around with query rewrites of the same form, turning NOT
IN into NOT EXISTS. As you rightfully pointed out, the straight forward
rewrite only works if neither the outer expression nor the sub-query
cannot yield NULLs, limiting the optimization considerably. Both cases
can be fixed by amending the basic form of the rewrite.
Basic form:
SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2)
=>
SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)
If the sub-select can yield NULLs, the rewrite can be fixed by adding an
OR t2.c1 IS NULL clause, such as:
SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)
which is equivalent to the following SQL which avoids the OR:
SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
If the outer expression can yield NULLs, the rewrite can be fixed by
adding a t1.c1 IS NOT NULL clause, such as:
SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)
Both fixes can of course be combined yielding:
SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
What's our today's take on doing more involved transformations inside
the planner to support such cases? It would greatly open up the scope of
the optimization.
I've noticed a loose end in the v1 patch.
The semantic gap between NOT IN and anti-join actually exists whenever
the operator returns NULL. For NOT IN, if (A op B) returns NULL, then
NOT (NULL) evaluates to NULL (effectively false), and the row is
discarded. In contrast, for an anti-join, if (A op B) returns NULL,
it implies no match was found, and the anti-join logic dictates that
the row should be kept.To guarantee that (A op B) never returns NULL, the current patch
verifies that both A and B are non-nullable. However, this is not
sufficient. The "op" might be an operator that returns NULL on
non-null inputs.On the other hand, if "op" does not return NULL on NULL inputs, like
IS DISTINCT FROM, we technically would not even need to require that A
and B are non-nullable.Is there a convenient way to verify that an operator never returns
NULL on non-null inputs? Would it be sufficient to insist that the
operator belongs to btree opclass (assuming that the strict ordering
requirements of btree imply this safety)?
There's lots of code that e.g. calls FunctionCall2[Coll]() on operators,
where that function raises an ERROR in case the returned value is NULL.
Hence, I would say it's safe to make that assumption.
Otherwise, you could restrict the check to only builtin operators. For
those we know they have the expected semantics.
--
David Geier
On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote:
If the sub-select can yield NULLs, the rewrite can be fixed by adding an
OR t2.c1 IS NULL clause, such as:SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)
I'm not sure if this rewrite results in a better plan. The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.
If the outer expression can yield NULLs, the rewrite can be fixed by
adding a t1.c1 IS NOT NULL clause, such as:SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)
This rewrite doesn't seem correct to me. If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.
What's our today's take on doing more involved transformations inside
the planner to support such cases? It would greatly open up the scope of
the optimization.
As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable. This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.
- Richard
On Wed, Feb 4, 2026 at 6:47 PM Richard Guo <guofenglinux@gmail.com> wrote:
Is there a convenient way to verify that an operator never returns
NULL on non-null inputs? Would it be sufficient to insist that the
operator belongs to btree opclass (assuming that the strict ordering
requirements of btree imply this safety)?
I think we can insist that the operator be a member of a btree or hash
opfamily. Btree operators must adhere to strict total order, and hash
operators must adhere to strict equality; if they return NULL for
non-null inputs, the indexes themselves would be corrupt.
I'm less confident about other access methods like gist or gin. Their
semantics can be more flexible, and using such operators in a NOT IN
clause is quite rare.
Attached is the updated patch, which adds the check requiring the
operator to be a member of a btree or hash opfamily.
And, is it worth checking if an operator never returns NULL even on
NULL inputs? If we can identify such operators, we should be able to
remove the requirement that both sides of NOT IN must be non-nullable.
Is there a convenient way to check for such operators?
I don't know how to check for such operators, so I didn't do it in the
patch.
- Richard
Attachments:
v2-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v2-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1186-65
HI Richard
As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable. This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.
Agree +1 ,The current path already covers common scenarios and is no less
comprehensive than other databases.I'm already quite pleased that it can be
merged.
Having tested a certain widely used open-source database, I found it unable
to process the following query: `SELECT * FROM join1 WHERE id NOT IN
(SELECT id FROM join2 WHERE id IS NOT NULL);` Note that join2 allows null
values for id.
Thanks
On Thu, Feb 5, 2026 at 2:09 PM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote:
If the sub-select can yield NULLs, the rewrite can be fixed by adding an
OR t2.c1 IS NULL clause, such as:SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)I'm not sure if this rewrite results in a better plan. The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.If the outer expression can yield NULLs, the rewrite can be fixed by
adding a t1.c1 IS NOT NULL clause, such as:SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)This rewrite doesn't seem correct to me. If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.What's our today's take on doing more involved transformations inside
the planner to support such cases? It would greatly open up the scope of
the optimization.As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable. This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.- Richard
On Wed, Feb 4, 2026 at 1:49 PM Zhang Mingli <zmlpostgres@gmail.com> wrote:
1. The comment doesn't match the SQL:
+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +);The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.
The qual "t1.id = t2.id" here is sufficient to force t1.id (and t2.id)
nonnullable, as the operator is strict. It doesn't have to be an
explicit IS NOT NULL qual.
2. Another test case that could use a more precise comment:
+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab);Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
The current comment is technically correct but might be clearer as:-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL
Hmm, I'm not sure if this is necessary. I don't think this test case
needs to concern itself with explaining standard JOIN/ON semantics.
3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:
-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
SELECT t2.id
FROM not_null_tab t1
LEFT JOIN not_null_tab t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
);The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).
Right, we can include this one in the test case.
- Richard
On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the updated patch, which adds the check requiring the
operator to be a member of a btree or hash opfamily.
Attached is another updated patch rebased on current master, with the
addition of support for RowCompareExpr to handle multi-column ordering
operations; otherwise unchanged.
- Richard
Attachments:
v3-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v3-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1272-67
On 05.02.2026 07:09, Richard Guo wrote:
On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote:
If the sub-select can yield NULLs, the rewrite can be fixed by adding an
OR t2.c1 IS NULL clause, such as:SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)I'm not sure if this rewrite results in a better plan. The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.
That's why I had shared a variant that doesn't have the OR but a instead
uses a second NOT EXISTS:
SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
If the outer expression can yield NULLs, the rewrite can be fixed by
adding a t1.c1 IS NOT NULL clause, such as:SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)This rewrite doesn't seem correct to me. If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.
Yes, that rewrite was only for the case where the outer expression can
yield NULLs but the sub-query cannot.
The very last rewrite combines both cases. The rewritten query then
looks like:
SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
What's our today's take on doing more involved transformations inside
the planner to support such cases? It would greatly open up the scope of
the optimization.As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable. This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.
Seems reasonable to start with the non-NULL variant, though there are
certainly cases where there's no PK / unique index on the relevant columns.
--
David Geier
On Mon, Mar 2, 2026 at 9:50 PM David Geier <geidav.pg@gmail.com> wrote:
The very last rewrite combines both cases. The rewritten query then
looks like:SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
I'm still not convinced this rewrite is correct. As I mentioned
earlier, it breaks down if t2 is empty while t1 contains NULL rows.
For example:
CREATE TABLE t1 (c1 int);
CREATE TABLE t2 (c1 int);
INSERT INTO t1 VALUES (1), (NULL);
SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2);
c1
----
1
(2 rows)
SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL);
c1
----
1
(1 row)
Seems reasonable to start with the non-NULL variant, though there are
certainly cases where there's no PK / unique index on the relevant columns.
Yeah. I don't know how to optimize nullable NOT IN clauses. It seems
quite difficult to handle safely purely via query transformations.
Maybe we can explore adding a dedicated Null-Aware Anti-Join execution
node, much like Oracle's approach. But that is definitely beyond the
scope of this current patch.
- Richard
On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the updated patch, which adds the check requiring the
operator to be a member of a btree or hash opfamily.
Attached is another updated patch rebased on current master, with the
addition of support for RowCompareExpr to handle multi-column ordering
operations; otherwise unchanged.
Attached is another updated patch rebased on current master, with some
minor cosmetic adjustments; nothing essential has changed.
- Richard
Attachments:
v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1272-67
Hi, Richard
On Wed, 04 Mar 2026 at 18:52, Richard Guo <guofenglinux@gmail.com> wrote:
On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the updated patch, which adds the check requiring the
operator to be a member of a btree or hash opfamily.Attached is another updated patch rebased on current master, with the
addition of support for RowCompareExpr to handle multi-column ordering
operations; otherwise unchanged.Attached is another updated patch rebased on current master, with some
minor cosmetic adjustments; nothing essential has changed.
Thank you for working on this!
Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 299b3354f6d..0d31861da7f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink)
{
Node *testexpr = sublink->testexpr;
List *outer_exprs = NIL;
- ListCell *lc;
/* Punt if sublink is not in the expected format */
if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL)
@@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink)
/* multi-column equality or inequality checks */
BoolExpr *bexpr = (BoolExpr *) testexpr;
- foreach(lc, bexpr->args)
+ foreach_ptr(OpExpr, opexpr, bexpr->args)
{
- OpExpr *opexpr = (OpExpr *) lfirst(lc);
-
if (!IsA(opexpr, OpExpr))
return false;
@@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink)
/* multi-column ordering checks */
RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr;
- foreach(lc, rcexpr->opnos)
+ foreach_oid(opno, rcexpr->opnos)
{
- Oid opno = lfirst_oid(lc);
-
/* verify operator safety; see comment above */
if (!op_is_safe_index_member(opno))
return false;
@@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink)
flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs);
/* Check that every outer expression is non-nullable */
- foreach(lc, outer_exprs)
+ foreach_ptr(Expr, expr, outer_exprs)
{
- Expr *expr = (Expr *) lfirst(lc);
-
/*
* We have already collected relation-level not-null constraints for
* the outer query, so we can consult the global hash table for
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index c47c9da4a9b..3f3baf2149a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query)
List *safe_quals = NIL;
List *nonnullable_vars = NIL;
bool computed_nonnullable_vars = false;
- ListCell *tl;
/*
* If the query contains set operations, punt. The set ops themselves
@@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query)
/*
* Examine each targetlist entry to prove that it can't produce NULL.
*/
- foreach(tl, query->targetList)
+ foreach_ptr(TargetEntry, tle, query->targetList)
{
- TargetEntry *tle = (TargetEntry *) lfirst(tl);
Expr *expr = tle->expr;
/* Resjunk columns can be ignored: they don't produce output values */
@@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List **safe_quals)
else if (IsA(jtnode, FromExpr))
{
FromExpr *f = (FromExpr *) jtnode;
- ListCell *lc;
/* All elements of the FROM list are allowable */
- foreach(lc, f->fromlist)
- find_subquery_safe_quals((Node *) lfirst(lc), safe_quals);
+ foreach_ptr(Node, node, f->fromlist)
+ find_subquery_safe_quals(node, safe_quals);
/* ... and its WHERE quals are too */
if (f->quals)
*safe_quals = lappend(*safe_quals, f->quals);
- Richard
[2. text/x-diff; v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]...
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
HI Japin
Just a quick note: I think `foreach_ptr` is more appropriate here than
`foreach`.
Thank you review this path , I think using foreach_node should be safe
```
#define foreach_ptr(type, var, lst) foreach_internal(type, *, var, lst,
lfirst)
#define foreach_int(var, lst) foreach_internal(int, , var, lst, lfirst_int)
#define foreach_oid(var, lst) foreach_internal(Oid, , var, lst, lfirst_oid)
#define foreach_xid(var, lst) foreach_internal(TransactionId, , var, lst,
lfirst_xid)
/*
* The internal implementation of the above macros. Do not use directly.
*
* This macro actually generates two loops in order to declare two variables
of
* different types. The outer loop only iterates once, so we expect
optimizing
* compilers will unroll it, thereby optimizing it away.
*/
#define foreach_internal(type, pointer, var, lst, func) \
for (type pointer var = 0, pointer var##__outerloop = (type pointer) 1; \
var##__outerloop; \
var##__outerloop = 0) \
for (ForEachState var##__state = {(lst), 0}; \
(var##__state.l != NIL && \
var##__state.i < var##__state.l->length && \
(var = (type pointer) func(&var##__state.l->elements[var##__state.i]), true));
\
var##__state.i++)
/*
* foreach_node -
* The same as foreach_ptr, but asserts that the element is of the specified
* node type.
*/
#define foreach_node(type, var, lst) \
for (type * var = 0, *var##__outerloop = (type *) 1; \
var##__outerloop; \
var##__outerloop = 0) \
for (ForEachState var##__state = {(lst), 0}; \
(var##__state.l != NIL && \
var##__state.i < var##__state.l->length && \
(var = lfirst_node(type, &var##__state.l->elements[var##__state.i]), true));
\
var##__state.i++)
```
On Thu, Mar 5, 2026 at 9:58 AM Japin Li <japinli@hotmail.com> wrote:
Show quoted text
Hi, Richard
On Wed, 04 Mar 2026 at 18:52, Richard Guo <guofenglinux@gmail.com> wrote:
On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <guofenglinux@gmail.com>
wrote:
On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <guofenglinux@gmail.com>
wrote:
Attached is the updated patch, which adds the check requiring the
operator to be a member of a btree or hash opfamily.Attached is another updated patch rebased on current master, with the
addition of support for RowCompareExpr to handle multi-column ordering
operations; otherwise unchanged.Attached is another updated patch rebased on current master, with some
minor cosmetic adjustments; nothing essential has changed.Thank you for working on this!
Just a quick note: I think `foreach_ptr` is more appropriate here than
`foreach`.diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 299b3354f6d..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) { Node *testexpr = sublink->testexpr; List *outer_exprs = NIL; - ListCell *lc;/* Punt if sublink is not in the expected format */
if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL)
@@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root,
SubLink *sublink)
/* multi-column equality or inequality checks */
BoolExpr *bexpr = (BoolExpr *) testexpr;- foreach(lc, bexpr->args) + foreach_ptr(OpExpr, opexpr, bexpr->args) { - OpExpr *opexpr = (OpExpr *) lfirst(lc); - if (!IsA(opexpr, OpExpr)) return false;@@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root,
SubLink *sublink)
/* multi-column ordering checks */
RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr;- foreach(lc, rcexpr->opnos) + foreach_oid(opno, rcexpr->opnos) { - Oid opno = lfirst_oid(lc); - /* verify operator safety; see comment above */ if (!op_is_safe_index_member(opno)) return false; @@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs);/* Check that every outer expression is non-nullable */ - foreach(lc, outer_exprs) + foreach_ptr(Expr, expr, outer_exprs) { - Expr *expr = (Expr *) lfirst(lc); - /* * We have already collected relation-level not-null constraints for * the outer query, so we can consult the global hash table for diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index c47c9da4a9b..3f3baf2149a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query) List *safe_quals = NIL; List *nonnullable_vars = NIL; bool computed_nonnullable_vars = false; - ListCell *tl;/* * If the query contains set operations, punt. The set ops themselves @@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query) /* * Examine each targetlist entry to prove that it can't produce NULL. */ - foreach(tl, query->targetList) + foreach_ptr(TargetEntry, tle, query->targetList) { - TargetEntry *tle = (TargetEntry *) lfirst(tl); Expr *expr = tle->expr;/* Resjunk columns can be ignored: they don't produce
output values */
@@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List
**safe_quals)
else if (IsA(jtnode, FromExpr))
{
FromExpr *f = (FromExpr *) jtnode;
- ListCell *lc;/* All elements of the FROM list are allowable */ - foreach(lc, f->fromlist) - find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + foreach_ptr(Node, node, f->fromlist) + find_subquery_safe_quals(node, safe_quals); /* ... and its WHERE quals are too */ if (f->quals) *safe_quals = lappend(*safe_quals, f->quals);- Richard
[2. text/x-diff;
v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]...
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
On Thu, Mar 5, 2026 at 10:57 AM Japin Li <japinli@hotmail.com> wrote:
Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`.
Agreed.
For homogeneous lists, it is better to use foreach_node so we get the
additional type-safety assertions in dev builds. For heterogeneous
lists, we should use foreach_ptr.
Attached is an updated version of the patch that replaces foreach with
foreach_ptr or foreach_node accordingly. Nothing else has changed.
I plan to do one more round of self-review on this. Unless there are
any further thoughts or concerns, I'm hoping to commit this in a week
or two. Please let me know if anyone spots anything else I might have
missed.
- Richard
Attachments:
v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1262-67
On Sun, Mar 8, 2026 at 5:16 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated version of the patch that replaces foreach with
foreach_ptr or foreach_node accordingly. Nothing else has changed.I plan to do one more round of self-review on this. Unless there are
any further thoughts or concerns, I'm hoping to commit this in a week
or two. Please let me know if anyone spots anything else I might have
missed.
I just finished a final self-review and noticed a subtle issue in
query_outputs_are_not_nullable(). When preparing a target expression
for the non-nullability check, the previous code flattened join
aliases Vars before grouping Vars. However, because the parser
processes FROM/JOIN clauses before the GROUP BY clause, a grouping Var
can actually wrap a join alias Var, not the reverse. So we should
flatten grouping Vars first.
Attached is an updated patch that fixes that issue. I plan to commit
this in a week or two, barring any objections.
- Richard
Attachments:
v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchapplication/octet-stream; name=v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patchDownload+1264-67
On Sun, 08 Mar 2026 at 17:16, Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Mar 5, 2026 at 10:57 AM Japin Li <japinli@hotmail.com> wrote:
Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`.
Agreed.
For homogeneous lists, it is better to use foreach_node so we get the
additional type-safety assertions in dev builds. For heterogeneous
lists, we should use foreach_ptr.Attached is an updated version of the patch that replaces foreach with
foreach_ptr or foreach_node accordingly. Nothing else has changed.I plan to do one more round of self-review on this. Unless there are
any further thoughts or concerns, I'm hoping to commit this in a week
or two. Please let me know if anyone spots anything else I might have
missed.
Thanks for updating the patch. LGTM.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.