Three types of functions, ala function redux.

Started by mlwabout 25 years ago7 messages
#1mlw
markw@mohawksoft.com

Given this basic SQL statement:

select * from table where col = function() ;

There are three basic types of SQL behaviors that should be able to be
performed.

(1) "function()" returns a single value. Postgres should be able to
understand how to optimize this to be: "select * from table where col =
value" where value is the datum returned by function.

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

(3) "function()" returns a value based on the query. (This seems to be
how it currently functions.) where "select * from table where col =
function()" will end up doing a full table scan.

(1) and (2) are related, and could probably be implemented using the
same code.
(3) Seems to be how Postgres is currently optimized.

It seems like Tom Lane laid the foundation for this behavior in 7.1
newC. (Does it now work this way?)

Does anyone see a problem with this thinking, and does it make sense to
attempt this for 7.2? I am looking into the function manager stuff to
see what would be involved.

--
http://www.mohawksoft.com

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: mlw (#1)
Re: Three types of functions, ala function redux.

[I was having trouble with the direct address so i'm only sending to
the list]

select * from table where col = function() ;

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I disagree here. I really don't think that changing = to mean "in"
in the system is a good idea. If the user wants an in they should
specify it.
I think "select * from table where col in (select function());" or
"select * from table where col in (select * from function());" or
even "select * from table where col in function();"
are better ways of specifying this sort of behavior.

If we do that (col = <function returning set>) meaning in, then does
col = (select statement that returns multiple rows) mean in and what
about col = <array>? I think doing it only for the function case is
a mistake.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: mlw (#1)
Re: Three types of functions, ala function redux.

mlw <markw@mohawksoft.com> writes:

There are three basic types of SQL behaviors that should be able to be
performed.

(1) "function()" returns a single value. Postgres should be able to
understand how to optimize this to be: "select * from table where col =
value" where value is the datum returned by function.

You get this now if the function is marked proiscachable.

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I object to the notion that "scalar = set" should be automatically
transformed into "scalar IN set". It would be nice to be smarter about
optimizing IN operations where the subselect only returns a few rows
into multiple indexscans, but how should the planner know that in advance?

(3) "function()" returns a value based on the query. (This seems to be
how it currently functions.) where "select * from table where col =
function()" will end up doing a full table scan.

You get this now if the function is not marked proiscachable.

regards, tom lane

#4Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#3)
AW: Three types of functions, ala function redux.

select * from table where col = function() ;

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I think the above needs a different sql statement to begin with.
The "= function()" clearly states that function is only allowed to return one row.

The following syntax currently works, and is imho sufficient:
select * from table where col in (select function());

Andreas

#5mlw
markw@mohawksoft.com
In reply to: Zeugswetter Andreas SB (#4)
Re: AW: Three types of functions, ala function redux.

Zeugswetter Andreas SB wrote:

select * from table where col = function() ;

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I think the above needs a different sql statement to begin with.
The "= function()" clearly states that function is only allowed to return one row.

The following syntax currently works, and is imho sufficient:
select * from table where col in (select function());

Both syntaxes work, but always force a table scan. If you have an index
on 'col' it will not be used. If your table has millions of records,
this takes time.

--
http://www.mohawksoft.com

#6Michael Fork
mfork@toledolink.com
In reply to: mlw (#5)
Re: AW: Three types of functions, ala function redux.

Acutally, a function can use an index scan *if* it is marked as cacheable:
(the "test" table has 1 field, col (type is int4), which is populated with
numbers 1 thru 5000)

testdb=# create function func_test_cache (int4) returns int4 as '
testdb'# select $1;
testdb'# ' LANGUAGE 'sql' with (iscachable);
CREATE
testdb=# create function func_test (int4) returns int4 as '
testdb'# select $1;
testdb'# ' LANGUAGE 'sql';
CREATE
testdb=# vacuum analyze;
VACUUM
testdb=# explain select * from test where col = func_test_cache(1);
NOTICE: QUERY PLAN:
Index Scan using idxtest on test (cost=0.00..2.01 rows=1 width=4)
EXPLAIN
testdb=# explain select * from test where col = func_test(1);
NOTICE: QUERY PLAN:
Seq Scan on test (cost=0.00..100.00 rows=1 width=4)
EXPLAIN

Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio

On Thu, 21 Dec 2000, mlw wrote:

Show quoted text

Zeugswetter Andreas SB wrote:

select * from table where col = function() ;

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I think the above needs a different sql statement to begin with.
The "= function()" clearly states that function is only allowed to return one row.

The following syntax currently works, and is imho sufficient:
select * from table where col in (select function());

Both syntaxes work, but always force a table scan. If you have an index
on 'col' it will not be used. If your table has millions of records,
this takes time.

--
http://www.mohawksoft.com

#7mlw
markw@mohawksoft.com
In reply to: mlw (#1)
Re: Three types of functions, ala function redux.

Tom Lane wrote:

mlw <markw@mohawksoft.com> writes:

There are three basic types of SQL behaviors that should be able to be
performed.

(1) "function()" returns a single value. Postgres should be able to
understand how to optimize this to be: "select * from table where col =
value" where value is the datum returned by function.

You get this now if the function is marked proiscachable.

Doh! RTFM!

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

I object to the notion that "scalar = set" should be automatically
transformed into "scalar IN set". It would be nice to be smarter about
optimizing IN operations where the subselect only returns a few rows
into multiple indexscans, but how should the planner know that in advance?

That is sort of my point. If one marks a function as "Iscachable" and
returns an isDone as false, will postgres keep calling until all values
have been returned, and then use an index scan with the finite (cached?)
set of results?

If so, this is exactly what I need.

(3) "function()" returns a value based on the query. (This seems to be
how it currently functions.) where "select * from table where col =
function()" will end up doing a full table scan.

You get this now if the function is not marked proiscachable.

A lot of my confusion has cleared, the "iscachable" flag is an
enlightenment. Boy am I schmuck. ;-}

--
http://www.mohawksoft.com