Left joining table to setof function

Started by Robert Fitzpatrickover 18 years ago3 messagesgeneral
Jump to latest
#1Robert Fitzpatrick
lists@webtent.net

I have a function that returns a set of a custom type...

trax=# \d client_search
Composite type "public.client_search"
Column | Type
----------------+------------------------
clientno | bigint
client | character varying(100)
contactno | bigint
city | character varying(50)
state | character varying(50)
contact_info | text
contact_title | character varying(100)
contact_phone | character varying(20)
contact_cell | character varying(20)
clientnamesort | text
lastnamesort | text

...and need to left join this with a table that can tell me if each
record exists in that table depending on certain values. When I get to
tens of thousands of records returned by the function, while my runtime
of the function itself is little changed, I am struggling to lower the
runtime as it gets very high when there are more and more matches in the
join. I have gone through to make sure all my comparisons in the join
are of same type, here is the table I am joining...

trax=# \d tblsearch_selections Table "public.tblsearch_selections"
Column | Type | Modifiers
---------------------+-----------------------+------------------------------------------------------------------------------------
search_selection_id | bigint | not null default nextval('tblsearch_selections_search_selection_id_seq'::regclass)
search_id | integer |
user_id | character varying(12) |
selection_value | text |
selected_value | bigint |
selection_type | character varying(10) | default 'client'::character varying
Indexes:
"tblsearch_selections_pkey" PRIMARY KEY, btree (search_selection_id)
"search_selection_unique_idx" UNIQUE, btree (search_id, user_id, selection_value, selection_type)
Foreign-key constraints:
"tblsearch_selections_search_fk" FOREIGN KEY (search_id) REFERENCES tblclientsearch(search_id) ON UPDATE CASCADE ON DELETE CASCADE
"tblsearch_selections_user_fk" FOREIGN KEY (user_id) REFERENCES tbluser(flduserid) ON UPDATE CASCADE ON DELETE CASCADE

Here is the analysis of the query with over 12000 join matches...

trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and user_id = 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and selection_type = 'contact'));
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=301.98..9434.48 rows=1000 width=426) (actual time=288.130..234578.634 rows=18202 loops=1)
Join Filter: (((tblsearch_selections.selected_value = client_search_id_func.clientno) AND ((tblsearch_selections.selection_type)::text = 'client'::text)) OR ((tblsearch_selections.selected_value = client_search_id_func.contactno) AND ((tblsearch_selections.selection_type)::text = 'contact'::text)))
-> Function Scan on client_search_id_func (cost=0.00..12.50 rows=1000 width=382) (actual time=253.805..265.752 rows=18202 loops=1)
-> Materialize (cost=301.98..305.02 rows=304 width=44) (actual time=0.001..4.931 rows=12747 loops=18202)
-> Seq Scan on tblsearch_selections (cost=0.00..301.68 rows=304 width=44) (actual time=0.040..12.749 rows=12747 loops=1)
Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text))
Total runtime: 234673.875 ms
(7 rows)

Perhaps I need to use subselects or some other approach?

trax=# explain analyze select *, (select search_selection_id from tblsearch_selections where search_id = 62 and user_id = 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and selection_type = 'contact'))) from client_search_id_func(62);
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Function Scan on client_search_id_func (cost=0.00..315837.50 rows=1000 width=382) (actual time=259.520..148524.590 rows=18202 loops=1)
SubPlan
-> Seq Scan on tblsearch_selections (cost=0.00..315.82 rows=1 width=8) (actual time=3.555..8.131 rows=1 loops=18202)
Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text) AND (((selected_value = $0) AND ((selection_type)::text = 'client'::text)) OR ((selected_value = $1) AND ((selection_type)::text = 'contact'::text))))
Total runtime: 148540.593 ms

And with no matches, the winner so far is the left join...

trax=# delete from tblsearch_selections where search_id = 62;
DELETE 12747
trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and user_id = 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and selection_type = 'contact'));
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=513.90..384586.40 rows=1000 width=426) (actual time=253.248..308.741 rows=18202 loops=1)
Join Filter: (((tblsearch_selections.selected_value = client_search_id_func.clientno) AND ((tblsearch_selections.selection_type)::text = 'client'::text)) OR ((tblsearch_selections.selected_value = client_search_id_func.contactno) AND ((tblsearch_selections.selection_type)::text = 'contact'::text)))
-> Function Scan on client_search_id_func (cost=0.00..12.50 rows=1000 width=382) (actual time=251.167..258.764 rows=18202 loops=1)
-> Materialize (cost=513.90..641.92 rows=12802 width=44) (actual time=0.001..0.001 rows=0 loops=18202)
-> Seq Scan on tblsearch_selections (cost=0.00..501.10 rows=12802 width=44) (actual time=2.066..2.066 rows=0 loops=1)
Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text))
Total runtime: 319.371 ms
(7 rows)

Thanks for any help!

--
Robert

#2Bruce Momjian
bruce@momjian.us
In reply to: Robert Fitzpatrick (#1)
Re: Left joining table to setof function

"Robert Fitzpatrick" <lists@webtent.net> writes:

Indexes:
"tblsearch_selections_pkey" PRIMARY KEY, btree (search_selection_id)
"search_selection_unique_idx" UNIQUE, btree (search_id, user_id, selection_value, selection_type)

SELECT *
FROM client_search_id_func(62)
LEFT JOIN tblsearch_selections ON search_id = 62
AND user_id = 'RF'
AND (( selected_value = clientno
AND selection_type = 'client'
) OR (
selected_value = contactno
AND selection_type = 'contact'))

According to the index the unique way to identify a record is (search_id,
user_id, selection_value, selection_type). But your join condition is matching
on (search_id, user_id, ***selected_value***, selection_type).

I'm not sure what these columns are but this looks like a bug. Certainly it
will make PostgreSQL less likely to use the index since it can only use the
first two columns of it which leaves it with few options.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Fitzpatrick (#1)
Re: Left joining table to setof function

Robert Fitzpatrick <lists@webtent.net> writes:

trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and user_id = 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and selection_type = 'contact'));

I don't think you're going to get anywhere until you get rid of the
OR-of-ANDs join condition; the OR defeats using any reasonably
intelligent join method such as a merge or hash join. You should
redesign the set-returning function to emit a representation that
matches the table you want to search, or perhaps rethink the table's
representation.

regards, tom lane