One question about transformation ANY Sublinks into joins

Started by Armorover 9 years ago4 messages

yupengstone@qq.com

over 9 years ago

Hi
I run a simple SQL with latest PG：
postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);
QUERY PLAN
------------------------------------------------------------
Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)
Filter: (t1.c1 = c2)
(5 rows)

and the table schema are as following:

I find PG decide not to pull up this sublink because the whereClauses in this sublink refer to the Vars of parent query, for detail please check the function named convert_ANY_sublink_to_join in src/backend/optimizer/plan/subselect.c.
However, for such simple sublink which has no agg, no window function, no limit, may be we can carefully pull up the predicates in whereCluase which refers to the Vars of parent query, then pull up this sublink and produce a query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=49.55..99.23 rows=565 width=8)
Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
-> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
-> Hash (cost=46.16..46.16 rows=226 width=8)
-> HashAggregate (cost=43.90..46.16 rows=226 width=8)
Group Key: t2.id2, t2.c2
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

------------------
Jerry Yu
https://github.com/scarbrofair

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Armor (#1)

Re: One question about transformation ANY Sublinks into joins

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:

Hi
I run a simple SQL with latest PG：
postgres=# explain select * from t1 where id1 in (select id2 from t2 where
c1=c2);
QUERY PLAN
------------------------------------------------------------
Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)
Filter: (t1.c1 = c2)
(5 rows)

and the table schema are as following:

postgres=# \d t1
Table "public.t1"
Column | Type | Modifiers
--------+---------+-----------
id1 | integer |
c1 | integer |

postgres=# \d t2
Table "public.t2"
Column | Type | Modifiers
--------+---------+-----------
id2 | integer |
c2 | integer |

I find PG decide not to pull up this sublink because the whereClauses
in this sublink refer to the Vars of parent query, for detail please check
the function named convert_ANY_sublink_to_join in
src/backend/optimizer/plan/subselect.c.
However, for such simple sublink which has no agg, no window function,
no limit, may be we can carefully pull up the predicates in whereCluase
which refers to the Vars of parent query, then pull up this sublink and
produce a query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where
c1=c2);
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=49.55..99.23 rows=565 width=8)
Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
-> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
-> Hash (cost=46.16..46.16 rows=226 width=8)
-> HashAggregate (cost=43.90..46.16 rows=226 width=8)
Group Key: t2.id2, t2.c2
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dilip Kumar

dilipbalaut@gmail.com

over 9 years ago

In reply to: Robert Haas (#2)

Re: One question about transformation ANY Sublinks into joins

On Thu, Jul 21, 2016 at 9:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

I guess, Hash Join will do here,
because inner hash node is, on hash aggregate with group key on t2.id2,
t2.c2
and hash join condition is (t1.id1 = t2.id2) AND (t1.c1 = t2.c2).

So I think these together will make sure that we don't get duplicate tuple
for one outer record.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Armor

yupengstone@qq.com

over 9 years ago

In reply to: Robert Haas (#2)

Re: One question about transformation ANY Sublinks into joins

After we pull up this sublink as semi join , when make join rel for semi join, the optimizer will take hash join method into account if a unique path can be created with the RHS, for detail please check make_join_rel in src/backend/optimizer/path/joinrels.c.
For this case, the cost of hash join is cheaper than semi join, so you can see the planner chose the hash join rather than semi join.

------------------
Jerry Yu
https://github.com/scarbrofair

------------------ Original ------------------
From: "Robert Haas";<robertmhaas@gmail.com>;
Date: Fri, Jul 22, 2016 00:23 AM
To: "Armor"<yupengstone@qq.com>;
Cc: "pgsql-hackers"<pgsql-hackers@postgresql.org>;
Subject: Re: [HACKERS] One question about transformation ANY Sublinks into joins

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:

Hi
I run a simple SQL with latest PG：
postgres=# explain select * from t1 where id1 in (select id2 from t2 where
c1=c2);
QUERY PLAN
------------------------------------------------------------
Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)
Filter: (t1.c1 = c2)
(5 rows)

and the table schema are as following:

postgres=# \d t1
Table "public.t1"
Column | Type | Modifiers
--------+---------+-----------
id1 | integer |
c1 | integer |

postgres=# \d t2
Table "public.t2"
Column | Type | Modifiers
--------+---------+-----------
id2 | integer |
c2 | integer |

I find PG decide not to pull up this sublink because the whereClauses
in this sublink refer to the Vars of parent query, for detail please check
the function named convert_ANY_sublink_to_join in
src/backend/optimizer/plan/subselect.c.
However, for such simple sublink which has no agg, no window function,
no limit, may be we can carefully pull up the predicates in whereCluase
which refers to the Vars of parent query, then pull up this sublink and
produce a query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where
c1=c2);
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=49.55..99.23 rows=565 width=8)
Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
-> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
-> Hash (cost=46.16..46.16 rows=226 width=8)
-> HashAggregate (cost=43.90..46.16 rows=226 width=8)
Group Key: t2.id2, t2.c2
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company