Using IN with subselect

Started by Dave Smithover 21 years ago7 messagesgeneral
Jump to latest
#1Dave Smith
dave.smith@candata.com

I have a query with an in subquery like

where x in (select x from y);

Now the subquery is not related to the outer query so it always returns
the same set. Is this subselect executed each time or just once? If it
is executed each time, if I create a function would that then be only
executed once?

--
Dave Smith
CANdata Systems Ltd
416-493-9020

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Dave Smith (#1)
Re: Using IN with subselect

Running EXPLAIN over the query will tell you...

On Thu, Nov 25, 2004 at 11:49:06AM -0500, Dave Smith wrote:

I have a query with an in subquery like

where x in (select x from y);

Now the subquery is not related to the outer query so it always returns
the same set. Is this subselect executed each time or just once? If it
is executed each time, if I create a function would that then be only
executed once?

--
Dave Smith
CANdata Systems Ltd
416-493-9020

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#3Dave Smith
dave.smith@candata.com
In reply to: Martijn van Oosterhout (#2)
Re: Using IN with subselect

Well here is explain. I would guess that it is executed each time ..
function any different?

HashAggregate (cost=288.32..288.32 rows=1 width=32)
-> Hash IN Join (cost=288.18..288.31 rows=1 width=32)
Hash Cond: (("outer".gl_num)::text =
lpad(ltrim(("inner".account_num)::text, '0'::text), 9, ' '::text))
-> Subquery Scan journal_all (cost=282.36..282.45 rows=2
width=64)
-> Unique (cost=282.36..282.43 rows=2 width=159)
-> Sort (cost=282.36..282.36 rows=2 width=159)
Sort Key: objectid, owner_oid, source_code,
posting_date, control_num, reference, gl_num, gl_amt, distributed_amt,
"comment", operator_id, branch_id, company_id
-> Append (cost=0.00..282.35 rows=2
width=159)
-> Subquery Scan "*SELECT* 1"
(cost=0.00..265.24 rows=1 width=159)
-> Index Scan using journal_9 on
journal (cost=0.00..265.23 rows=1 width=159)
Index Cond: (company_id =
1000)
Filter: ((posting_date >=
'2004-01-01'::date) AND (posting_date <= '2004-01-31'::date))
-> Subquery Scan "*SELECT* 2"
(cost=0.00..17.10 rows=1 width=159)
-> Index Scan using
journal_hist_7 on journal_hist (cost=0.00..17.09 rows=1 width=159)
Index Cond: (company_id =
1000)
Filter: ((posting_date >=
'2004-01-01'::date) AND (posting_date <= '2004-01-31'::date))
-> Hash (cost=5.83..5.83 rows=1 width=13)
-> Index Scan using glmast_index3 on glmast
(cost=0.00..5.83 rows=1 width=13)
Index Cond: ((company_id = 1000) AND
((control_type)::text = 'F'::text))

On Thu, 2004-11-25 at 12:11, Martijn van Oosterhout wrote:

Running EXPLAIN over the query will tell you...

On Thu, Nov 25, 2004 at 11:49:06AM -0500, Dave Smith wrote:

I have a query with an in subquery like

where x in (select x from y);

Now the subquery is not related to the outer query so it always returns
the same set. Is this subselect executed each time or just once? If it
is executed each time, if I create a function would that then be only
executed once?

--
Dave Smith
CANdata Systems Ltd
416-493-9020

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Dave Smith
CANdata Systems Ltd
416-493-9020

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Dave Smith (#3)
Re: Using IN with subselect

On Thu, Nov 25, 2004 at 12:19:23PM -0500, Dave Smith wrote:

Well here is explain. I would guess that it is executed each time ..
function any different?

Hmm, if it's only executed once I would expect it to be an InitPlan.
Are you sure it's not correlated?

If you want to make sure, put the subquery in the FROM clause, that it
definitly will only be run once.

<snip>

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Smith (#3)
Re: Using IN with subselect

Dave Smith <dave.smith@candata.com> writes:

Well here is explain. I would guess that it is executed each time ..
function any different?

HashAggregate (cost=288.32..288.32 rows=1 width=32)
-> Hash IN Join (cost=288.18..288.31 rows=1 width=32)
-> Subquery Scan journal_all (cost=282.36..282.45 rows=2 width=64)
-> Hash (cost=5.83..5.83 rows=1 width=13)
-> Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)

No ... this plan says to scan glmast once, load the selected rows into
an in-memory hash table, then scan journal_all once and probe the hash
table for matches. It looks like a pretty decent choice of plan to me.

regards, tom lane

#6Dave Smith
dave.smith@candata.com
In reply to: Tom Lane (#5)
Re: Using IN with subselect

That's what I wanted it to do I just did not understand how to read the
explain. So is it HashAggregate that means this already loaded?

On Thu, 2004-11-25 at 12:57, Tom Lane wrote:

Dave Smith <dave.smith@candata.com> writes:

Well here is explain. I would guess that it is executed each time ..
function any different?

HashAggregate (cost=288.32..288.32 rows=1 width=32)
-> Hash IN Join (cost=288.18..288.31 rows=1 width=32)
-> Subquery Scan journal_all (cost=282.36..282.45 rows=2 width=64)
-> Hash (cost=5.83..5.83 rows=1 width=13)
-> Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)

No ... this plan says to scan glmast once, load the selected rows into
an in-memory hash table, then scan journal_all once and probe the hash
table for matches. It looks like a pretty decent choice of plan to me.

regards, tom lane

--
Dave Smith
CANdata Systems Ltd
416-493-9020

#7Bruce Momjian
bruce@momjian.us
In reply to: Dave Smith (#6)
Re: Using IN with subselect

Dave Smith <dave.smith@candata.com> writes:

That's what I wanted it to do I just did not understand how to read the
explain. So is it HashAggregate that means this already loaded?

The HashAggregate Node is doing a GROUP BY command. (or the implicit GROUP BY
if you used an aggregate function without one.)

The "Hash IN Join" is a particular type of join that behaves the way Tom
describes. Each type of join behaves differently. A Nested Loop join would
execute one side once and the other many times. A Merge join executes both
sides once.

You could also run "explain analyze" instead of just "explain" and look at the
"loops" number that would tell you how many times the node actually was
executed.

Dave Smith <dave.smith@candata.com> writes:

Well here is explain. I would guess that it is executed each time ..
function any different?

HashAggregate (cost=288.32..288.32 rows=1 width=32)
-> Hash IN Join (cost=288.18..288.31 rows=1 width=32)
-> Subquery Scan journal_all (cost=282.36..282.45 rows=2 width=64)
-> Hash (cost=5.83..5.83 rows=1 width=13)
-> Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)

No ... this plan says to scan glmast once, load the selected rows into
an in-memory hash table, then scan journal_all once and probe the hash
table for matches. It looks like a pretty decent choice of plan to me.

--
greg