Outer where pushed down

Started by Gaetano Mendolaover 20 years ago3 messageshackers

mendola@bigfoot.com

over 20 years ago

Hi all,

consider this view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM user_login ul,
current_connection cc
WHERE ul.id_user = cc.id_user;

And this is the explain on a usage of that view:

# explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Hash Join (cost=42.79..1325.14 rows=451 width=5)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_login ul (cost=0.00..1142.72 rows=27024 width=4)
Filter: (sp_connected_test(id_user) = false)
-> Hash (cost=40.49..40.49 rows=919 width=5)
-> Index Scan using idx_connected on current_connection cc (cost=0.00..40.49 rows=919 width=5)
Index Cond: (connected = true)
(7 rows)

apart my initial surprise to see that function applied at rows not returned by the view
( Tom Lane explained me that the planner is able to push down the outer condition )
why postgres doesn't apply that function at table current_connection given the fact are extimated
only 919 vs 27024 rows?

redefining the view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT cc.id_user
FROM user_login ul,
current_connection cc
WHERE ul.id_user = cc.id_user;

then I obtain the "desidered" plan.

# explain select * from v_current_connection_test where sp_connected_test(id_user ) = FALSE;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Hash Join (cost=46.23..1193.47 rows=452 width=5)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_login ul (cost=0.00..872.48 rows=54048 width=4)
-> Hash (cost=45.08..45.08 rows=460 width=5)
-> Index Scan using idx_connected on current_connection cc (cost=0.00..45.08 rows=460 width=5)
Index Cond: (connected = true)
Filter: (sp_connected_test(id_user) = false)
(7 rows)

Is not possible in any way push postgres to apply that function to the right table ?
Shall I rewrite the views figuring out wich column is better to expose ?

Regards
Gaetano Mendola

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Gaetano Mendola (#1)

Re: Outer where pushed down

Gaetano Mendola <mendola@bigfoot.com> writes:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM user_login ul,
current_connection cc
WHERE ul.id_user = cc.id_user;

# explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;

why postgres doesn't apply that function at table current_connection given the fact are extimated
only 919 vs 27024 rows?

Because the condition is on a field of the other table.

You seem to wish that the planner would use "ul.id_user = cc.id_user"
to decide that "sp_connected_test(ul.id_user)" can be rewritten as
"sp_connected_test(cc.id_user)", but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are. As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.

regards, tom lane

Gaetano Mendola

mendola@bigfoot.com

over 20 years ago

In reply to: Tom Lane (#2)

Re: Outer where pushed down

Tom Lane wrote:

Gaetano Mendola <mendola@bigfoot.com> writes:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM user_login ul,
current_connection cc
WHERE ul.id_user = cc.id_user;

# explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;

why postgres doesn't apply that function at table current_connection given the fact are extimated
only 919 vs 27024 rows?

Because the condition is on a field of the other table.

You seem to wish that the planner would use "ul.id_user = cc.id_user"
to decide that "sp_connected_test(ul.id_user)" can be rewritten as
"sp_connected_test(cc.id_user)", but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are. As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.

And what about to define for each type when this is safe and let
the planner make his best choice ?
Rewriting that view the execution time passed from 4 secs to 1 sec,
that is not bad if the planner can do it autonomously. In this very
example I can decide if it's better expose one column or the other
one but in other cases not...

Regards
Gaetano Mendola