Exists pull-up application with JoinExpr

Started by Alena Rybakinaover 1 year ago24 messageshackers
Jump to latest
#1Alena Rybakina
a.rybakina@postgrespro.ru

Hi, hackers!

I found one pull-up that works if the inner join condition is written
through the where condition,

|create temp table ta (id int primary key, val int); insert into ta
values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|

|create temp table tb (id int primary key, aval int); insert into tb
values(4,1); insert into tb values(5,1); insert into tb values(1,2);
create temp table tc (id int primary key, aid int); insert into tc
values(6,1); insert into tc values(7,2);|

|EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id);|
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=67 read=12
(14 rows)

but it doesn't work if it is written through the outside condition.

|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id); QUERY PLAN
------------------------------------------------------ Seq Scan on ta
(actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by
Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual rows=0
loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual rows=0 loops=3)
Filter: (ta.id = id) Rows Removed by Filter: 3 Buffers: local hit=3 ->
Seq Scan on tc (actual rows=1 loops=1) Buffers: local hit=1 Planning:
Buffers: shared hit=16 read=9 (15 rows) |

|I have written a patch to add this functionality and now it gives an
query plan: |

|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb JOIN tc
                  ON ta.id = tb.id);
                     QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=1 loops=1)
   Buffers: local hit=6
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Nested Loop (actual rows=0 loops=3)
         Buffers: local hit=5
         ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
               Index Cond: (id = ta.id)
               Heap Fetches: 1
               Buffers: local hit=4
         ->  Seq Scan on tc (actual rows=1 loops=1)
               Buffers: local hit=1
(12 rows)|

tb and tc form a Cartesian product, but in the case of the intersection
condition with tuples from the table ta (ta.id = tb.id). So, according
to the join condition, tb intersects only with 1, and only it gets into
the result, but at the same time they appear twice - this is because of
the Cartesian product of tb with tc

|*How it works:*
|

I rewrote the code a bit so that it considers not only the quals in
jointree->quals, but also those in join expression
(subselect->jointree->fromlist). If they satisfy the conditions for
using pull up, I add them to the list of clauses and form a "Bool"
expression from them, joined by an "AND" operation.

--

Regards, Alena Rybakina Postgres Professional

Attachments:

0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patchtext/x-patch; charset=UTF-8; name=0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patchDownload+305-6
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Alena Rybakina (#1)
Re: Exists pull-up application with JoinExpr

Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <
a.rybakina@postgrespro.ru> escreveu:

Hi, hackers!

I found one pull-up that works if the inner join condition is written
through the where condition,

create temp table ta (id int primary key, val int);
insert into ta values(1,1);
insert into ta values(2,2);insert into ta values(3,3);
create temp table tb (id int primary key, aval int);
insert into tb values(4,1);
insert into tb values(5,1);
insert into tb values(1,2);

create temp table tc (id int primary key, aid int);
insert into tc values(6,1);
insert into tc values(7,2);

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb, tc
WHERE ta.id = tb.id);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=67 read=12
(14 rows)

but it doesn't work if it is written through the outside condition.

alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb JOIN tc
ON ta.id = tb.id);
QUERY PLAN
------------------------------------------------------
Seq Scan on ta (actual rows=1 loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 2
Buffers: local hit=5
SubPlan 1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=4
-> Seq Scan on tb (actual rows=0 loops=3)
Filter: (ta.id = id)
Rows Removed by Filter: 3
Buffers: local hit=3
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=16 read=9
(15 rows)

I have written a patch to add this functionality and now it gives an query
plan:

alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb JOIN tc
ON ta.id = tb.id);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
(12 rows)

tb and tc form a Cartesian product, but in the case of the intersection
condition with tuples from the table ta (ta.id = tb.id). So, according to
the join condition, tb intersects only with 1, and only it gets into the
result, but at the same time they appear twice - this is because of the
Cartesian product of tb with tc
*How it works:*

I rewrote the code a bit so that it considers not only the quals in
jointree->quals, but also those in join expression
(subselect->jointree->fromlist). If they satisfy the conditions for using
pull up, I add them to the list of clauses and form a "Bool" expression
from them, joined by an "AND" operation.

I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this:
QUERY PLAN
 -------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
 -> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
 -> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
 Index Cond: (id = ta.id)
 Heap Fetches: 2
+ Buffers: local hit=4
 -> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

best regards,
Ranier Vilela

Attachments:

v1-0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patchapplication/octet-stream; name=v1-0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patchDownload+293-4
#3Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Ranier Vilela (#2)
Re: Exists pull-up application with JoinExpr

On 24.12.2024 13:25, Ranier Vilela wrote:

Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina
<a.rybakina@postgrespro.ru> escreveu:

Hi, hackers!

I found one pull-up that works if the inner join condition is
written through the where condition,

|create temp table ta (id int primary key, val int); insert into
ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|

|create temp table tb (id int primary key, aval int); insert into
tb values(4,1); insert into tb values(5,1); insert into tb
values(1,2); create temp table tc (id int primary key, aid int);
insert into tc values(6,1); insert into tc values(7,2);|

|EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *
FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id
<http://ta.id&gt; = tb.id <http://tb.id&gt;);|
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id =ta.id <http://ta.id&gt;)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=67 read=12
(14 rows)

but it doesn't work if it is written through the outside condition.

|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON
ta.id <http://ta.id&gt; = tb.id <http://tb.id&gt;); QUERY PLAN
------------------------------------------------------ Seq Scan on
ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed
by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual
rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual
rows=0 loops=3) Filter: (ta.id <http://ta.id&gt; = id) Rows Removed
by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1
loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16
read=9 (15 rows) |

|I have written a patch to add this functionality and now it gives
an query plan: |

|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb JOIN tc
                  ON ta.id <http://ta.id&gt; = tb.id <http://tb.id&gt;);
                     QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=1 loops=1)
   Buffers: local hit=6
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Nested Loop (actual rows=0 loops=3)
         Buffers: local hit=5
         ->  Index Only Scan using tb_pkey on tb (actual rows=0
loops=3)
               Index Cond: (id = ta.id <http://ta.id&gt;)
               Heap Fetches: 1
               Buffers: local hit=4
         ->  Seq Scan on tc (actual rows=1 loops=1)
               Buffers: local hit=1
(12 rows)|

tb and tc form a Cartesian product, but in the case of the
intersection condition with tuples from the table ta (ta.id
<http://ta.id&gt; = tb.id <http://tb.id&gt;). So, according to the join
condition, tb intersects only with 1, and only it gets into the
result, but at the same time they appear twice - this is because
of the Cartesian product of tb with tc

|*How it works:*
|

I rewrote the code a bit so that it considers not only the quals
in jointree->quals, but also those in join expression
(subselect->jointree->fromlist). If they satisfy the conditions
for using pull up, I add them to the list of clauses and form a
"Bool" expression from them, joined by an "AND" operation.

I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this:
QUERY PLAN
 -------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
 -> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
 -> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
 Index Cond: (id = ta.id <http://ta.id&gt;)
 Heap Fetches: 2
+ Buffers: local hit=4
 -> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

Yes, you are right) Thank you for your interest to this thread)

--
Regards,
Alena Rybakina
Postgres Professional

#4Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Alena Rybakina (#3)
Re: Exists pull-up application with JoinExpr

Hi Alena,

Thank you for your work on subqueries with JOIN.

Have you considered the scenario where in subquery includes a qual like
(tc.aid = 1)? When I tried executing those queries I receive different
results. In my opinion, to prevent this, we should add filters for such
quals within the loop 'foreach (lc, all_clauses)'

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (ta.id = tb.id)
   Buffers: local hit=3
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Hash (actual rows=3 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 33kB
         Buffers: local hit=2
         ->  HashAggregate (actual rows=3 loops=1)
               Group Key: tb.id
               Batches: 1  Memory Usage: 121kB
               Buffers: local hit=2
               ->  Nested Loop (actual rows=3 loops=1)
                     Buffers: local hit=2
                     ->  Seq Scan on tb (actual rows=3 loops=1)
                           Buffers: local hit=1
                     ->  Materialize (actual rows=1 loops=3)
                           Storage: Memory  Maximum Storage: 17kB
                           Buffers: local hit=1
                           ->  Seq Scan on tc (actual rows=1 loops=1)
                                 Filter: (aid = 1)
                                 Rows Removed by Filter: 1
                                 Buffers: local hit=1
(23 rows)

============================

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
                                QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=6
   SubPlan 1
     ->  Nested Loop (actual rows=0 loops=3)
           Buffers: local hit=5
           ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
                 Index Cond: (id = ta.id)
                 Heap Fetches: 1
                 Buffers: local hit=4
           ->  Seq Scan on tc (actual rows=1 loops=1)
                 Filter: (aid = 1)
                 Buffers: local hit=1
(14 rows)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

#5Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Ilia Evdokimov (#4)
Re: Exists pull-up application with JoinExpr

Hi! Thank you for your interest to this subject!

On 27.12.2024 15:53, Ilia Evdokimov wrote:

Hi Alena,

Thank you for your work on subqueries with JOIN.

Have you considered the scenario where in subquery includes a qual
like (tc.aid = 1)? When I tried executing those queries I receive
different results. In my opinion, to prevent this, we should add
filters for such quals within the loop 'foreach (lc, all_clauses)'

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (ta.id = tb.id)
   Buffers: local hit=3
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Hash (actual rows=3 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 33kB
         Buffers: local hit=2
         ->  HashAggregate (actual rows=3 loops=1)
               Group Key: tb.id
               Batches: 1  Memory Usage: 121kB
               Buffers: local hit=2
               ->  Nested Loop (actual rows=3 loops=1)
                     Buffers: local hit=2
                     ->  Seq Scan on tb (actual rows=3 loops=1)
                           Buffers: local hit=1
                     ->  Materialize (actual rows=1 loops=3)
                           Storage: Memory  Maximum Storage: 17kB
                           Buffers: local hit=1
                           ->  Seq Scan on tc (actual rows=1 loops=1)
                                 Filter: (aid = 1)
                                 Rows Removed by Filter: 1
                                 Buffers: local hit=1
(23 rows)

============================

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
                                QUERY PLAN
---------------------------------------------------------------------------

 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=6
   SubPlan 1
     ->  Nested Loop (actual rows=0 loops=3)
           Buffers: local hit=5
           ->  Index Only Scan using tb_pkey on tb (actual rows=0
loops=3)
                 Index Cond: (id = ta.id)
                 Heap Fetches: 1
                 Buffers: local hit=4
           ->  Seq Scan on tc (actual rows=1 loops=1)
                 Filter: (aid = 1)
                 Buffers: local hit=1
(14 rows)

You are right, at the moment the code is not processed if there is a
constant qual in the subquery (like t1.x1=1 in the example below) and
this problem is not only related to the current patch.

For example you can get such a query plan if you complete this request
to the master:

create table t (xint);
create table t1 (x1int);
create table t2 (x2int);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT 1
FROM t
WHERE EXISTS (SELECT 1
FROM t1
where t1.x1 = 1);
QUERY PLAN
------------------------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: (InitPlan 1).col1
InitPlan 1
-> Seq Scan on t1 (actual rows=0 loops=1)
Filter: (x1 = 1)
-> Seq Scan on t (never executed)
(6 rows)

It's all because of the check in this function - this qual has levelsoup
= 0, not 1 (see (!contain_vars_of_level(whereClause, 1)), but I already
found out that by changing this, the logic of correction there is
required a little more complicated. At the moment, I'm working to add
this processing to the patch.

Thanks for the case!

--
Regards,
Alena Rybakina
Postgres Professional

#6Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alena Rybakina (#5)
Re: Exists pull-up application with JoinExpr

Hi! I have solved it.

On 30.12.2024 11:24, Alena Rybakina wrote:

Hi! Thank you for your interest to this subject!

On 27.12.2024 15:53, Ilia Evdokimov wrote:

Hi Alena,

Thank you for your work on subqueries with JOIN.

Have you considered the scenario where in subquery includes a qual
like (tc.aid = 1)? When I tried executing those queries I receive
different results. In my opinion, to prevent this, we should add
filters for such quals within the loop 'foreach (lc, all_clauses)'

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (ta.id = tb.id)
   Buffers: local hit=3
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Hash (actual rows=3 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 33kB
         Buffers: local hit=2
         ->  HashAggregate (actual rows=3 loops=1)
               Group Key: tb.id
               Batches: 1  Memory Usage: 121kB
               Buffers: local hit=2
               ->  Nested Loop (actual rows=3 loops=1)
                     Buffers: local hit=2
                     ->  Seq Scan on tb (actual rows=3 loops=1)
                           Buffers: local hit=1
                     ->  Materialize (actual rows=1 loops=3)
                           Storage: Memory  Maximum Storage: 17kB
                           Buffers: local hit=1
                           ->  Seq Scan on tc (actual rows=1 loops=1)
                                 Filter: (aid = 1)
                                 Rows Removed by Filter: 1
                                 Buffers: local hit=1
(23 rows)

============================

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
                                QUERY PLAN
---------------------------------------------------------------------------

 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=6
   SubPlan 1
     ->  Nested Loop (actual rows=0 loops=3)
           Buffers: local hit=5
           ->  Index Only Scan using tb_pkey on tb (actual rows=0
loops=3)
                 Index Cond: (id = ta.id)
                 Heap Fetches: 1
                 Buffers: local hit=4
           ->  Seq Scan on tc (actual rows=1 loops=1)
                 Filter: (aid = 1)
                 Buffers: local hit=1
(14 rows)

You are right, at the moment the code is not processed if there is a
constant qual in the subquery (like t1.x1=1 in the example below) and
this problem is not only related to the current patch.

For example you can get such a query plan if you complete this request
to the master:

create table t (xint);
create table t1 (x1int);
create table t2 (x2int);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT 1
FROM t
WHERE EXISTS (SELECT 1
FROM t1
where t1.x1 = 1);
QUERY PLAN
------------------------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: (InitPlan 1).col1
InitPlan 1
-> Seq Scan on t1 (actual rows=0 loops=1)
Filter: (x1 = 1)
-> Seq Scan on t (never executed)
(6 rows)

It's all because of the check in this function - this qual has
levelsoup = 0, not 1 (see (!contain_vars_of_level(whereClause, 1)),
but I already found out that by changing this, the logic of correction
there is required a little more complicated. At the moment, I'm
working to add this processing to the patch.

Thanks for the case!

The logic is the same, but extended to constants. I added a few more
tests that not only cover this case, but also NOT EXISTS, which will be
converted to ANTI JOIN.

--
Regards,
Alena Rybakina
Postgres Professional

Attachments:

v2-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patchtext/x-patch; charset=UTF-8; name=v2-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patchDownload+433-34
#7Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alena Rybakina (#6)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi! I found another example where the transformation worked incorrectly
and reconsidered the idea.

As for conversion of exists_sublink_to_ANY, we need to get the flattened
implicit-AND list of clauses and pull out the chunks of the WHERE clause
that belong to the parent query,
since we are called halfway through the parent's preprocess_expression()
and earlier steps of preprocess_expression() wouldn't get applied to the
pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var is
on - if it's in a subquery, we only need to lower its level (varlevel)
because subquery will be flatted, while
for other vars that belong to the parent query, we need to do
preparation to pull up the sub-select into top range table.

For those expressions that we couldn't assign to either list, we define
newWhere and apply both cases.

--
Regards,
Alena Rybakina
Postgres Professional

Attachments:

v3-0001-Add-appling-the-pull-up-transformation-for-expressio.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-appling-the-pull-up-transformation-for-expressio.patchDownload+550-63
#8Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Alena Rybakina (#7)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

On 09.02.2025 18:14, Alena Rybakina wrote:

Hi! I found another example where the transformation worked
incorrectly and reconsidered the idea.

As for conversion of exists_sublink_to_ANY, we need to get the
flattened implicit-AND list of clauses and pull out the chunks of the
WHERE clause that belong to the parent query,
since we are called halfway through the parent's
preprocess_expression() and earlier steps of preprocess_expression()
wouldn't get applied to the pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var is
on - if it's in a subquery, we only need to lower its level (varlevel)
because subquery will be flatted, while
for other vars that belong to the parent query, we need to do
preparation to pull up the sub-select into top range table.

For those expressions that we couldn't assign to either list, we
define newWhere and apply both cases.

When I run 'make -C contrib/ check', tests of postgres_fdw extension
failed. I might be wrong, but you should be careful with LIMIT.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachments:

regression.diffstext/plain; charset=UTF-8; name=regression.diffsDownload+21-10
#9Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Ilia Evdokimov (#8)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

On 10.02.2025 23:51, Ilia Evdokimov wrote:

On 09.02.2025 18:14, Alena Rybakina wrote:

Hi! I found another example where the transformation worked
incorrectly and reconsidered the idea.

As for conversion of exists_sublink_to_ANY, we need to get the
flattened implicit-AND list of clauses and pull out the chunks of the
WHERE clause that belong to the parent query,
since we are called halfway through the parent's
preprocess_expression() and earlier steps of preprocess_expression()
wouldn't get applied to the pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var
is on - if it's in a subquery, we only need to lower its level
(varlevel) because subquery will be flatted, while
for other vars that belong to the parent query, we need to do
preparation to pull up the sub-select into top range table.

For those expressions that we couldn't assign to either list, we
define newWhere and apply both cases.

When I run 'make -C contrib/ check', tests of postgres_fdw extension
failed. I might be wrong, but you should be careful with LIMIT.

Thank you for the review, I'm working on it.

--
Regards,
Alena Rybakina
Postgres Professional

#10Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alena Rybakina (#9)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi!

My colleague reviewed my patch and gave feedback on how to improve it -
for some queries with data types that I did not consider, pull-up is not
applied, although it should. Some of them:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
  FROM ta
 WHERE EXISTS (SELECT 1
                 FROM tb
                 JOIN tc
                   ON ta.id = tb.id
                  AND tb.id = ANY('{1}'::int[])
              );

                                QUERY PLAN
--------------------------------------------------------------------------
 Seq Scan on ta (actual rows=1.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 1
   SubPlan 1
     ->  Nested Loop (actual rows=0.50 loops=2)
           ->  Seq Scan on tb (actual rows=0.50 loops=2)
                 Filter: ((id = ANY ('{1}'::integer[])) AND (ta.id = id))
                 Rows Removed by Filter: 2
           ->  Seq Scan on tc (actual rows=1.00 loops=1)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active
               );
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on ta (actual rows=2.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   SubPlan 1
     ->  Nested Loop (actual rows=1.00 loops=2)
           ->  Seq Scan on tb (actual rows=1.00 loops=2)
                 Filter: (is_active AND (ta.id = id))
                 Rows Removed by Filter: 0
           ->  Seq Scan on tc (actual rows=1.00 loops=2)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active IS NOT NULL
               );

                                   QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on ta (actual rows=2.00 loops=1)
   Filter: EXISTS(SubPlan 1)
    SubPlan 1
       ->  Nested Loop (actual rows=1.00 loops=2)
             ->  Seq Scan on tb (actual rows=1.00 loops=2)
                   Filter: ((is_active IS NOT NULL) AND (ta.id = id))
                   Rows Removed by Filter: 0
             ->  Seq Scan on tc (actual rows=1.00 loops=2)

UPDATE tb SET is_active = NULL WHERE id = 2;

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active IS NULL
               );
                              QUERY PLAN
----------------------------------------------------------------
 Seq Scan on ta (actual rows=1.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 1
   SubPlan 1
     ->  Nested Loop (actual rows=0.50 loops=2)
           ->  Seq Scan on tb (actual rows=0.50 loops=2)
                 Filter: ((is_active IS NULL) AND (ta.id = id))
                 Rows Removed by Filter: 4
           ->  Seq Scan on tc (actual rows=1.00 loops=1)

I see that I need to add a walker that, when traversing the tree,
determines whether there are conditions under which pull-up is
impossible - the presence of
volatility of functions and other restrictions, and leave the
transformation for the var objects that I added before, I described it
here.

Unfortunately, I need a few days to implement this and need time for a
review, and I think I will not have time to do this before the code
freeze, so
I am moving this to the next commitfest and not changing the status
"awaiting the author".

On 11.02.2025 18:59, Alena Rybakina wrote:

On 10.02.2025 23:51, Ilia Evdokimov wrote:

On 09.02.2025 18:14, Alena Rybakina wrote:

Hi! I found another example where the transformation worked
incorrectly and reconsidered the idea.

As for conversion of exists_sublink_to_ANY, we need to get the
flattened implicit-AND list of clauses and pull out the chunks of
the WHERE clause that belong to the parent query,
since we are called halfway through the parent's
preprocess_expression() and earlier steps of preprocess_expression()
wouldn't get applied to the pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var
is on - if it's in a subquery, we only need to lower its level
(varlevel) because subquery will be flatted, while
for other vars that belong to the parent query, we need to do
preparation to pull up the sub-select into top range table.

For those expressions that we couldn't assign to either list, we
define newWhere and apply both cases.

When I run 'make -C contrib/ check', tests of postgres_fdw extension
failed. I might be wrong, but you should be careful with LIMIT.

Thank you for the review, I'm working on it.

Sorry for not responding, but I will fix this bug after I update the
code based on the comments above. Thank you for noticing and writing to
me, your feedback is very important.

--
Regards,
Alena Rybakina
Postgres Professional

#11Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Alena Rybakina (#10)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

On 02.04.2025 19:39, Alena Rybakina wrote:

I see that I need to add a walker that, when traversing the tree,
determines whether there are conditions under which pull-up is
impossible - the presence of
volatility of functions and other restrictions, and leave the
transformation for the var objects that I added before, I described it
here.

I have some concerns about pulling up every clause from the subquery
with one column. In particular, not every clause is safe or beneficial
to pull up: OR-clauses, CASE expressions, nested sublinks could
significantly change how the planner estimates the number of rows or
applies filters, especially when they are not true join predicates.
Pulling them up might lead to worse plans, or even change the semantics
in subtle ways. I think before applying such transformations, we should
make sure they are not only safe but actually improve the resulting plan.

#12Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Ilia Evdokimov (#11)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi, all! I updated the patch and it looks nice. All the problems have
been solved.

On 03.04.2025 16:56, Ilia Evdokimov wrote:

On 02.04.2025 19:39, Alena Rybakina wrote:

I see that I need to add a walker that, when traversing the tree,
determines whether there are conditions under which pull-up is
impossible - the presence of
volatility of functions and other restrictions, and leave the
transformation for the var objects that I added before, I described
it here.

I have some concerns about pulling up every clause from the subquery
with one column. In particular, not every clause is safe or beneficial
to pull up: OR-clauses, CASE expressions, nested sublinks could
significantly change how the planner estimates the number of rows or
applies filters, especially when they are not true join predicates.
Pulling them up might lead to worse plans, or even change the
semantics in subtle ways. I think before applying such
transformations, we should make sure they are not only safe but
actually improve the resulting plan.

There may indeed be cases where a query plan without pull-up is worse
than with pull-up.

For example, as shown below, with pull-up we don't need to scan two
tables and perform a join, since the subquery returns 0 rows (no
matching tuples in the inner sequential scan in a parameterized Nested
Loop).
However, this cannot be detected at the current planning stage - we
simply don't have that information yet.

Do you have any ideas on how to solve this problem? So far, the only
approach I see is to try an alternative plan but I'm still learning this.

For example:

create table t(x int);
create table t1(x int);
create table t2(x int);

insert into t2 select id from generate_series(20001,30000) as id;
insert into t1 select id from generate_series(10001,20000) as id;
insert into t select id from generate_series(1,10000) as id;
vacuum analyze;
explain analyze select * from t where exists (select * from t1 join t2
on t.x = t1.x);

with my patch:
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1500540.00..1500822.50 rows=10000 width=4) (actual
time=70694.658..70694.662 rows=0.00 loops=1)
   Hash Cond: (t.x = t1.x)
   Buffers: shared hit=135
*->* *Seq Scan on t*  (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.009..1.545 *rows=10000.00* loops=1)
         Buffers: shared hit=45
   ->  Hash  (cost=1500415.00..1500415.00 rows=10000 width=4) (actual
time=70690.524..70690.526 rows=10000.00 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 480kB
         Buffers: shared hit=90
         ->  HashAggregate  (cost=1500315.00..1500415.00 rows=10000
width=4) (actual time=70683.143..70686.590 rows=10000.00 loops=1)
               Group Key: t1.x
               Batches: 1  Memory Usage: 793kB
               Buffers: shared hit=90
*->* *Nested Loop* (cost=0.00..1250315.00 rows=100000000 width=4)
(actual time=0.019..25650.447 *rows=100000000.00* loops=1)
                     Buffers: shared hit=90
*->  Seq Scan on t1* (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.006..4.931 *rows=10000.00* loops=1)
                           Buffers: shared hit=45
                     ->  Materialize  (cost=0.00..195.00 rows=10000
width=0) (actual time=0.000..0.875 rows=10000.00 loops=10000)
                           Storage: Memory  Maximum Storage: 519kB
                           Buffers: shared hit=45
                           ->  Seq Scan on t2  (cost=0.00..145.00
rows=10000 width=0) (actual time=0.007..1.246 rows=10000.00 loops=1)
                                 Buffers: shared hit=45
 Planning:
   Buffers: shared hit=36 read=3
 Planning Time: 0.375 ms
*Execution Time: 70695.154 ms*

without my patch:

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
*Seq Scan on t*  (cost=0.00..309.30 rows=5738 width=4) (actual
time=68268.562..68268.565 *rows=0.00* loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 10000
   Buffers: shared hit=900045
   SubPlan 1
     -> *Nested Loop*  (cost=0.00..8524.27 rows=654075 width=0) (actual
time=6.823..6.823 *rows=0.00* loops=10000)
           Buffers: shared hit=900000
           ->  Seq Scan on t2  (cost=0.00..159.75 rows=11475 width=0)
(actual time=0.011..1.660 rows=10000.00 loops=10000)
                 Buffers: shared hit=450000
           ->  Materialize  (cost=0.00..188.72 rows=57 width=0) (actual
time=0.000..0.000 rows=0.00 loops=100000000)
                 Storage: Memory  Maximum Storage: 17kB
                 Buffers: shared hit=450000
                 ->  Seq Scan on t1  (cost=0.00..188.44 rows=57
width=0) (actual time=2.403..2.403 *rows=0.00* loops=10000)
                       Filter: (t.x = x)
                       Rows Removed by Filter: 10000
                       Buffers: shared hit=450000
 Planning:
   Buffers: shared hit=40 read=16
 Planning Time: 0.487 ms
 Execution Time: *68268.600 ms*

--
Regards,
Alena Rybakina
Postgres Professional

Attachments:

v4-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patchtext/x-patch; charset=UTF-8; name=v4-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patchDownload+1095-23
#13solaimurugan vellaipandiyan
drsolaimurugan.v@gmail.com
In reply to: Alena Rybakina (#16)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi Alena,

I tested the latest patch on current master and verified the planner
behavior for the EXISTS pull-up cases described in the thread.

Before applying the patch, the following query produced an EXISTS(SubPlan) plan:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (
SELECT *
FROM tb
JOIN tc ON ta.id = tb.id
);

Plan before patch:
Seq Scan on ta
Filter: EXISTS(SubPlan exists_1)

After applying the patch, the same query is transformed into:

Nested Loop Semi Join

with Index Only Scan on tb_pkey, so the pull-up optimization is now
applied correctly for JOIN ON clauses referencing outer Vars.

I also tested some additional cases mentioned in the discussion:

1. NOT EXISTS case

planner generated Nested Loop Anti Join as expected.

2 . Constant qual case (WHERE tc.aid = 1)

planner produced an optimized Hash Join plan and avoided EXISTS(SubPlan).

3. ANY array condition

tb.id = ANY('{1}'::int[])

planner successfully used Index Only Scan and generated an
optimized join plan.

In all tested cases, query results remained correct and I did not
observe incorrect transformations during my testing.

Overall the patch behavior looks good from my side and the planner now
behaves consistently for these EXISTS pull-up scenarios.

Regards,
Solaimurugan V

#14Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Alena Rybakina (#12)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

On 04.06.2025 13:40, Alena Rybakina wrote:

Do you have any ideas on how to solve this problem? So far, the only
approach I see is to try an alternative plan but I'm still learning this.

Hi,

I've reviewed this patch, and I have suggestion about the approach.

Currently, the patch extends 'convert_EXISTS_sublick_to_join' with quite
complex logic (clause collection, volatile checks, rewriting join quals,
etc). While it works, the amount of branching and special cases makes
the function harder to follow.

Looking at the logic, it seems that a large part of the complexity comes
from trying to directly adapt 'convert_EXISTS_sublink_to_join' instead
of factoring out a dedicated path. An alternative would be to introduce
a separate function *'convert_EXISTS_sublink_to_lateral_join' *- with a
similar API to 'convert_ANY_sublink_to_join'. Such a function can focus
only on the EXISTS-to-join case, while keeping the existing function
shorter and easier to reason about.

I even made some first rough sketches of this approach (not a finished
patch, just an outline). Of course, it would still need proper
adaptation, but I think it demonstrates that the overall structure can
be kept simpler.

What do you think about refactoring in this direction?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Attachments:

convert_EXISTS_sublink_to_lateral_join.patchtext/x-patch; charset=UTF-8; name=convert_EXISTS_sublink_to_lateral_join.patchDownload+93-0
#15Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Ilia Evdokimov (#14)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi! Thank you for your attention to this patch!

On 03.09.2025 00:07, Ilia Evdokimov wrote:

On 04.06.2025 13:40, Alena Rybakina wrote:

Do you have any ideas on how to solve this problem? So far, the only
approach I see is to try an alternative plan but I'm still learning this.

Hi,

I've reviewed this patch, and I have suggestion about the approach.

Currently, the patch extends 'convert_EXISTS_sublick_to_join' with
quite complex logic (clause collection, volatile checks, rewriting
join quals, etc). While it works, the amount of branching and special
cases makes the function harder to follow.

Looking at the logic, it seems that a large part of the complexity
comes from trying to directly adapt 'convert_EXISTS_sublink_to_join'
instead of factoring out a dedicated path. An alternative would be to
introduce a separate function
*'convert_EXISTS_sublink_to_lateral_join' *- with a similar API to
'convert_ANY_sublink_to_join'. Such a function can focus only on the
EXISTS-to-join case, while keeping the existing function shorter and
easier to reason about.

I even made some first rough sketches of this approach (not a finished
patch, just an outline). Of course, it would still need proper
adaptation, but I think it demonstrates that the overall structure can
be kept simpler.

What do you think about refactoring in this direction?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

I'll look at this. I need some time to consider it.

I'm working on this approach right now. I introduced mutator and made
the transformation if it is possible there but I need to fix some bugs.

Attachments:

pull_up.difftext/x-patch; charset=UTF-8; name=pull_up.diffDownload+1741-133
#16Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alena Rybakina (#15)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi, I rebased the patch and fixed earlier problem.

Show quoted text

On 03.09.2025 00:20, Alena Rybakina wrote:

I'll look at this. I need some time to consider it.

I'm working on this approach right now. I introduced mutator and made
the transformation if it is possible there but I need to fix some bugs.

Attachments:

v5-0001-Enables-pull-up-of-EXISTS-subqueries-tha.patchtext/x-patch; charset=UTF-8; name=v5-0001-Enables-pull-up-of-EXISTS-subqueries-tha.patchDownload+1736-133
#17Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Ilia Evdokimov (#14)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hi!

On 03.09.2025 00:07, Ilia Evdokimov wrote:

Hi,

I've reviewed this patch, and I have suggestion about the approach.

Currently, the patch extends 'convert_EXISTS_sublick_to_join' with
quite complex logic (clause collection, volatile checks, rewriting
join quals, etc). While it works, the amount of branching and special
cases makes the function harder to follow.

Looking at the logic, it seems that a large part of the complexity
comes from trying to directly adapt 'convert_EXISTS_sublink_to_join'
instead of factoring out a dedicated path. An alternative would be to
introduce a separate function
*'convert_EXISTS_sublink_to_lateral_join' *- with a similar API to
'convert_ANY_sublink_to_join'. Such a function can focus only on the
EXISTS-to-join case, while keeping the existing function shorter and
easier to reason about.

I even made some first rough sketches of this approach (not a finished
patch, just an outline). Of course, it would still need proper
adaptation, but I think it demonstrates that the overall structure can
be kept simpler.

What do you think about refactoring in this direction?

I think this approach isn’t fully correct. By forming a join between a
subquery and the outer relation, you effectively force the optimizer to
choose a Nested Loop join. Furthermore, it prevents the planner from
exploring all join orders between the subquery’s tables and the outer
relation, so we may miss a more optimal plan.

With your patch, I consistently get the following plan. I even disabled
nested loops to see whether the planner could switch to a Hash Join or
Merge Join, but those aren’t applicable with lateral parameters in this
pattern.

CREATE TABLE ta (id int PRIMARY KEY, val int);
INSERT INTO ta VALUES (1,1), (2,2);

CREATE TABLE tb (id int PRIMARY KEY, aval int);
INSERT INTO tb VALUES (1,1), (2,1);

CREATE TABLE tc (id int PRIMARY KEY, aid int);
INSERT INTO tc VALUES (3,5), (1,5);

CREATE TABLE td (id int PRIMARY KEY, aid int);
INSERT INTO td VALUES (1,6), (2,7), (3,8), (4,9);

CREATE TABLE te (id int PRIMARY KEY, aid int);
INSERT INTO te VALUES (5,6), (6,7), (7,8), (4,9), (1,1);

SET enable_nestloop = OFF;

EXPLAIN ANALYZE
SELECT ta.id
FROM ta
WHERE EXISTS (
  SELECT 1
  FROM tb
  WHERE tb.id = ta.id
    AND EXISTS (SELECT 1 FROM tc WHERE tc.id = tb.id)
);

                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.31..37017.50 rows=2260 width=4) (actual
time=0.116..0.142 rows=2.00 loops=1)
   Disabled: true
   Buffers: local hit=9
   ->  Seq Scan on ta  (cost=0.00..32.60 rows=2260 width=4) (actual
time=0.037..0.039 rows=2.00 loops=1)
         Buffers: local hit=1

   ->  Nested Loop Semi Join  (cost=0.31..16.36 rows=1 width=0) (actual
time=0.045..0.046 rows=1.00 loops=2)
         Disabled: true
         Buffers: local hit=8
         ->  Index Only Scan using tb_pkey on tb (cost=0.15..8.17
rows=1 width=4) (actual time=0.030..0.030 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
         ->  Index Only Scan using tc_pkey on tc (cost=0.15..8.17
rows=1 width=4) (actual time=0.010..0.010 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
 Planning Time: 0.539 ms
 Execution Time: 0.252 ms
(20 rows)

Anyway, thank you for the work and attention here - your feedback was
useful!

I’ve also rebased the patch on current master.

Attachments:

v6-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patchtext/x-patch; charset=UTF-8; name=v6-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patchDownload+1487-47
#18Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Alena Rybakina (#17)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

I have updated the patch with some bug-fixes after having run Smith's
test with lateral and null elements.

To detect lateral elements I left the check that there are no one outer
references left in the subquery after the mutator transformation.

Everything looks well now. I currently continue testing my patch to
detect more corner-cases.

Feel free to review my patch!

-----------
Best regards,
Yandex Cloud
Alena Rybakina

Attachments:

v7-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patchtext/plain; charset=UTF-8; name=v7-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patchDownload+1508-59
#19Petr Petrov
p.petrov@postgrespro.ru
In reply to: solaimurugan vellaipandiyan (#13)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hello Alena!

Sorry for my long silence, if you are interested in my thoughts about
your patch then here they are:

1)  First of all, let's read the comment below.

/*
 * Separate out the WHERE clause.  (We could theoretically also remove
 * top-level plain JOIN/ON clauses, but it's probably not worth the
 * trouble.)
 */

We need to separate two things: the jointree and the WHERE clause, so
it's possible to do something like this

whereClause = subselect->jointree->quals;
subselect->jointree->quals = NULL;

jointree = subselect->jointree;
subselect->jointree = NULL;

if (contain_vars_of_level((Node *) subselect, 1))
    return NULL;

/* Do our checks in the jointree and stop if we can't do pullup */

/* Return the jointree back */
subselect->jointree = jointree;

I think it's more clear than in the patch right now.

2) We don't need to use get_relids_in_jointree() and nullable_above
since the tree's traversing is from the top to the bottom and we know that

     in LEFT JOIN and FULL JOIN LHS is nullable
     in RIGHT JOIN and FULL JOIN RHS  is nullable

So we can use a boolean variables like this

rarg_is_nullable = (is_nullable_side ||
                                  j->jointype == JOIN_FULL ||
                                  j->jointype == JOIN_LEFT);

larg_is_nullable = (is_nullable_side ||
                                 j->jointype == JOIN_FULL ||
                                 j->jointype == JOIN_RIGHT);

And then work with them. I suspect it will be much easier to follow that
in the patch right now.

I fear that you don't check FULL JOINS here.

3) To be honest, we just work with the top jointree, we don't descend to
subqueries, therefore, I am not sure that the mutator is a good name here.

    AFAICS, mutators were designed to modify something including some
parts of the subqueries but it's not the case here.

    It's a simple jointree traversal, we don't need
HoistJoinQualsContext as well.

    I think, we need three things here:

    Node *node - the node in the jointree we are working with

    bool is_nullable_side - are we on the nullable side of some outer join

    List **exprs - the list in which we collect JoinExpr and FromExpr
with outer references.

    I propose just traverse the jointree, make our checks and then
collect JoinExpr and FromExpr for the further processing if everything
is good.

4) After checking the WHERE clause and the jointree we can traverse our
list, make a new whereClause by appending quals with outer references
and then replace quals in JoinExprs to constant true.

    Something like make_and_qual()

5) I have also noticed that you are using canonicalize_qual()

    I suspect we don't need it either since it will be called later.
And here is the stack

    subquery_planner
       pull_up_sublinks(root)
       preprocess_qual_conditions(root, (Node *) parse->jointree)
           j->quals = preprocess_expression(root, j->quals, EXPRKIND_QUAL)
              expr = (Node *) canonicalize_qual((Expr *) expr, false)
                 find_duplicate_ors(qual, is_check)

   So we will flatten all nested BoolExpr with the type BOOL_AND. Let's
allow Postgres do its job.

6) I have noticed the new output from one regression test. The previous
output was

    Merge Anti Join
      Merge Cond: (t1.c1 = t2.c2)
       ->  Sort
               Sort Key: t1.c1
               ->  Seq Scan on tt4x t1
       ->  Sort

   and the new output is the Hash Anti Join

   Hash Anti Join
       Hash Cond: (t1.c1 = t2.c2)
        ->  Seq Scan on tt4x t1
        ->  Hash

  This is very suspicious, why does this patch cause such changes?

7) There is a SubLink which won't be pulled up, you can see it below

explain (COSTS OFF)
SELECT *
  FROM tenk1 A
 WHERE EXISTS (SELECT 1
                                FROM tenk2 B, (SELECT f.hundred
 FROM (SELECT A.hundred) AS f
                                                           ) AS v
                             WHERE B.hundred = v.hundred
              );

And the reason is the following. (SELECT A.hundred) is the subquery and
won't be processed until the function  pull_up_sublinks() finishes its job.

But it lead us to an interesting observation. Maybe we could process it
somewhere in the SS_process_sublinks().

Perhaps, f and v subqueries could be pulled up and then the EXISTS
sublink could be replaced with a SEMI JOIN.

I don't know whether we should solve this problem right now. I have
never seen such queries in real workloads but it doesn't mean that they
are not exists.

In short, I think your patch can be simplified.

On 6/26/25 18:40, solaimurugan vellaipandiyan wrote:

In all tested cases, query results remained correct and I did not
observe incorrect transformations during my testing.

Overall the patch behavior looks good from my side and the planner now
behaves consistently for these EXISTS pull-up scenarios.

Regards,
Solaimurugan V

Sincerely yours.

Peter Petrov.

#20Peter Petrov
pspetrov91@gmail.com
In reply to: Alena Rybakina (#18)
Re: pull-up subquery if JOIN-ON contains refs to upper-query

Hello Alena!

Sorry for my long silence, if you are interested in my thoughts about your
patch then here they are:

1) First of all, let's read the comment below.

/*
* Separate out the WHERE clause. (We could theoretically also remove
* top-level plain JOIN/ON clauses, but it's probably not worth the
* trouble.)
*/

We need to separate two things: the jointree and the WHERE clause, so it's
possible to do something like this

whereClause = subselect->jointree->quals;
subselect->jointree->quals = NULL;

jointree = subselect->jointree;
subselect->jointree = NULL;

if (contain_vars_of_level((Node *) subselect, 1))
return NULL;

/* Do our checks in the jointree and stop if we can't do pullup */

/* Return the jointree back */
subselect->jointree = jointree;

I think it's more clear than in the patch right now.

2) We don't need to use get_relids_in_jointree() and nullable_above since
the tree's traversing is from the top to the bottom and we know that

in LEFT JOIN and FULL JOIN LHS is nullable
in RIGHT JOIN and FULL JOIN RHS is nullable

So we can use a boolean variables like this

rarg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_LEFT);

larg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_RIGHT);

And then work with them. I suspect it will be much easier to follow that in
the patch right now.

I fear that you don't check FULL JOINS here.

3) To be honest, we just work with the top jointree, we don't descend to
subqueries, therefore, I am not sure that the mutator is a good name here.

AFAICS, mutators were designed to modify something including some parts
of the subqueries but it's not the case here.

It's a simple jointree traversal, we don't need HoistJoinQualsContext
as well.

I think, we need three things here:

Node *node - the node in the jointree we are working with

bool is_nullable_side - are we on the nullable side of some outer join

List **exprs - the list in which we collect JoinExpr and FromExpr with
outer references.

I propose just traverse the jointree, make our checks and then collect
JoinExpr and FromExpr for the further processing if everything is good.

4) After checking the WHERE clause and the jointree we can traverse our
list, make a new whereClause by appending quals with outer references and
then replace quals in JoinExprs to constant true.

Something like make_and_qual()

5) I have also noticed that you are using canonicalize_qual()

I suspect we don't need it either since it will be called later. And
here is the stack

subquery_planner
pull_up_sublinks(root)
preprocess_qual_conditions(root, (Node *) parse->jointree)
j->quals = preprocess_expression(root, j->quals, EXPRKIND_QUAL)
expr = (Node *) canonicalize_qual((Expr *) expr, false)
find_duplicate_ors(qual, is_check)

So we will flatten all nested BoolExpr with the type BOOL_AND.

6) I have noticed the new output from one regression test. The previous
output was

Merge Anti Join
Merge Cond: (t1.c1 = t2.c2)
-> Sort
Sort Key: t1.c1
-> Seq Scan on tt4x t1
-> Sort

and the new output is the Hash Anti Join

Hash Anti Join
Hash Cond: (t1.c1 = t2.c2)
-> Seq Scan on tt4x t1
-> Hash

This is very suspicious, why does this patch cause such changes?

7) There is a SubLink which won't be pulled up, you can see it below

explain (COSTS OFF)
SELECT *
FROM tenk1 A
WHERE EXISTS (SELECT 1
FROM tenk2 B, (SELECT f.hundred
FROM (SELECT
A.hundred) AS f
) AS v
WHERE B.hundred = v.hundred
);
And the reason is the following. (SELECT A.hundred) is the subquery and
won't be processed until the function pull_up_sublinks() finishes its job.

But it leads us to an interesting observation. Maybe we could process it
somewhere in the SS_process_sublinks().

Perhaps, f and v subqueries could be pulled up and then the EXISTS sublink
could be replaced with a SEMI JOIN.

I don't know whether we should solve this problem right now. I have never
seen such queries in real workloads but it doesn't mean that they don't
exist.

In short, I think your patch can be simplified.

пт, 8 мая 2026 г. в 20:37, Alena Rybakina <lena.ribackina@yandex.ru>:

Show quoted text

Feel free to review my patch!

-----------
Best regards,
Yandex Cloud
Alena Rybakina

#21solaimurugan vellaipandiyan
drsolaimurugan.v@gmail.com
In reply to: Petr Petrov (#19)
#22Alena Rybakina
lena.ribackina@yandex.ru
In reply to: solaimurugan vellaipandiyan (#21)
#23Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Peter Petrov (#20)
#24Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Alena Rybakina (#23)