Slow queries when functions are inlined

Started by Evan Martinabout 14 years ago4 messagesgeneral

postgresql@realityexists.net

about 14 years ago

I've run into a weird query performance problem. I have a large, complex
query which joins the results of several set-returning functions with
some tables and filters them by calling another function, which involves
PostGIS calls (ST_DWithin). This used to run in about 10 seconds until I
changed the functions to allow them to be inlined. (They previously had
"SET search_path FROM current", which prevented inlining.) Now the query
doesn't return in 10 minutes! If I again prevent the filtering function
from being inlined (eg. by adding STRICT or SECURITY DEFINER or SET) the
time goes down to 20 seconds. If I do the same to one of the
set-returning functions it goes down to 15 seconds. It seems to change
the query plan at the top level: without inlining it picks a Hash Join
or Merge Join (fast), but with inlining it picks a Nested Loop (slow).

I can reproduce the problem with the following simplified test case,
running on PostgreSQL 9.1.3 with PostGIS 2.0.0. (Couldn't get it to
happen without PostGIS, unfortunately.)

CREATE EXTENSION postgis;

CREATE DOMAIN my_timestamp AS timestamp;

CREATE TABLE _test_pos (
id serial,
pos geography(Point,4326)
);

CREATE TABLE _test_time (
id integer,
some_time my_timestamp
);

-- Don't automatically run ANALYZE
ALTER TABLE _test_pos SET (autovacuum_enabled = false,
toast.autovacuum_enabled = false);
ALTER TABLE _test_time SET (autovacuum_enabled = false,
toast.autovacuum_enabled = false);

-- Insert some dummy data

WITH rand AS
(
SELECT generate_series(1, 20000)::float / 1000 AS x
)

INSERT INTO _test_pos(pos)
SELECT ST_MakePoint(x, x)::geography
FROM rand;

INSERT INTO _test_time (id, some_time)
SELECT id, '2012-05-04'::my_timestamp
FROM _test_pos;

CREATE OR REPLACE FUNCTION __test_get_ids(some_time my_timestamp)
RETURNS SETOF _test_time
AS $BODY$
SELECT *
FROM _test_time
WHERE some_time <= $1
$BODY$ LANGUAGE SQL STABLE;

CREATE OR REPLACE FUNCTION __test_points_are_near(p1 geography, p2
geography)
RETURNS boolean AS
$BODY$
SELECT ST_DWithin($1, $2, 300000) OR ST_DWithin($1, $2, 400000);
$BODY$ LANGUAGE SQL IMMUTABLE;

/*
-- It only becomes slow after analyzing
ANALYZE VERBOSE;
*/

/*
-- Delete all stats and it's fast again (must be user "postgres" for
this to work)
DELETE FROM pg_statistic s
USING pg_class c
WHERE c.oid = s.starelid AND c.relname IN ('_test_time', '_test_pos');
*/

-- The prolematic query
with cte AS
(
select id
from __test_get_ids('2012-05-15'::my_timestamp)
join _test_pos USING (id)
)
select id
from cte
join _test_pos USING (id)
where __test_points_are_near('POINT(7 7)', pos);

This query is initially fast (140 ms), but after running ANALYZE the
query plan changes from Hash Join to Nested Loop and it takes 15000 ms.
If I delete the table statistics again it goes back to the fast plan.
(This doesn't help on the original complex query, though.) If I mark
__test_points_are_near as STRICT it uses the fast plan. If I remove one
of the ST_DWithin calls it uses the fast plan. Even if I use the
"timestamp" type directly instead of a domain it uses the fast plan. But
with this exact combination of factors it uses the slow plan.

This is the "slow" plan (with inlining):

Nested Loop (cost=1162.01..12106.68 rows=1 width=4)
Join Filter: (cte.id = public._test_pos.id)
CTE cte
-> Hash Join (cost=442.34..1162.01 rows=6667 width=4)
Hash Cond: (public._test_pos.id = _test_time.id)
-> Seq Scan on _test_pos (cost=0.00..428.00 rows=20000 width=4)
-> Hash (cost=359.00..359.00 rows=6667 width=4)
-> Seq Scan on _test_time (cost=0.00..359.00
rows=6667 width=4)
Filter: ((some_time)::timestamp without time zone
<= (('2012-05-15 00:00:00'::timestamp without time
zone)::my_timestamp)::timestamp without time zone)
-> Seq Scan on _test_pos (cost=0.00..10728.00 rows=1 width=4)
Filter:
((('0101000020E61000000000000000001C400000000000001C40'::geography &&
_st_expand(pos, 300000::double precision)) AND (pos &&
'0101000020E61000000000000000001C400000000000001C40'::geography) AND
_st_dwithin('0101000020E61000000000000000001C400000000000001C40'::geography,
pos, 300000::double precision, true)) OR
(('0101000020E61000000000000000001C400000000000001C40'::geography &&
_st_expand(pos, 400000::double precision)) AND (pos &&
'0101000020E61000000000000000001C400000000000001C40'::geography) AND
_st_dwithin('0101000020E61000000000000000001C400000000000001C40'::geography,
pos, 400000::double precision, true)))
-> CTE Scan on cte (cost=0.00..133.34 rows=6667 width=4)

This is the "fast" plan (without inlining):

Hash Join (cost=6673.34..6903.91 rows=2222 width=4)
Hash Cond: (cte.id = public._test_pos.id)
CTE cte
-> Hash Join (cost=442.34..1162.01 rows=6667 width=4)
Hash Cond: (public._test_pos.id = _test_time.id)
-> Seq Scan on _test_pos (cost=0.00..428.00 rows=20000 width=4)
-> Hash (cost=359.00..359.00 rows=6667 width=4)
-> Seq Scan on _test_time (cost=0.00..359.00
rows=6667 width=4)
Filter: ((some_time)::timestamp without time zone
<= (('2012-05-15 00:00:00'::timestamp without time
zone)::my_timestamp)::timestamp without time zone)
-> CTE Scan on cte (cost=0.00..133.34 rows=6667 width=4)
-> Hash (cost=5428.00..5428.00 rows=6667 width=4)
-> Seq Scan on _test_pos (cost=0.00..5428.00 rows=6667 width=4)
Filter:
__test_points_are_near('0101000020E61000000000000000001C400000000000001C40'::geography,
pos)

Can anyone figure out what is going on here and how I can work around
this properly, ie. while still allowing functions to be inlined (which
is good in other scenarios)?

Thanks,

Evan

Tom Lane

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Evan Martin (#1)

Re: Slow queries when functions are inlined

Evan Martin <postgresql@realityexists.net> writes:

I've run into a weird query performance problem. I have a large, complex
query which joins the results of several set-returning functions with
some tables and filters them by calling another function, which involves
PostGIS calls (ST_DWithin). This used to run in about 10 seconds until I
changed the functions to allow them to be inlined. (They previously had
"SET search_path FROM current", which prevented inlining.) Now the query
doesn't return in 10 minutes!

You didn't show EXPLAIN ANALYZE results, but I see that one query is
estimating that 6667 rows from _test_pos pass the filter, while the
other thinks only 1 row passes; that changes the planner's ideas about
how to do the join, and evidently not for the better. In the case of
the opaque user-defined function, you're just getting a default
selectivity estimate, and it's really just blind luck if that is close
to reality. But in the other case it should be invoking
PostGIS-provided selectivity estimation functions, and apparently those
are giving poor results. I think you'd be best off to ask about that
on the PostGIS mailing lists.

regards, tom lane

Evan Martin

postgresql@realityexists.net

about 14 years ago

In reply to: Tom Lane (#2)

Re: Slow queries when functions are inlined

Thanks, Tom. You mean this bit, right?

-> Seq Scan on _test_pos (cost=0.00..10728.00 rows=1 width=4)
Filter:
((('0101000020E61000000000000000001C400000000000001C40'::geography &&
_st_expand(pos, 300000::double precision)) AND ...

I tried to find some info on selectivity estimation functions, but only
came up with
http://www.postgresql.org/docs/9.1/static/xoper-optimization.html which
talks about operators. Is there something similar for functions? Or does
the rows estimate come from the PostGIS && operator that's used
internally by ST_DWithin? Just trying to understand this better so I
know what to ask on the PostGIS list.

Thanks,

Evan

Show quoted text

On 17/05/2012 12:31 AM, Tom Lane wrote:

Evan Martin<postgresql@realityexists.net> writes:

I've run into a weird query performance problem. I have a large, complex
query which joins the results of several set-returning functions with
some tables and filters them by calling another function, which involves
PostGIS calls (ST_DWithin). This used to run in about 10 seconds until I
changed the functions to allow them to be inlined. (They previously had
"SET search_path FROM current", which prevented inlining.) Now the query
doesn't return in 10 minutes!

You didn't show EXPLAIN ANALYZE results, but I see that one query is
estimating that 6667 rows from _test_pos pass the filter, while the
other thinks only 1 row passes; that changes the planner's ideas about
how to do the join, and evidently not for the better. In the case of
the opaque user-defined function, you're just getting a default
selectivity estimate, and it's really just blind luck if that is close
to reality. But in the other case it should be invoking
PostGIS-provided selectivity estimation functions, and apparently those
are giving poor results. I think you'd be best off to ask about that
on the PostGIS mailing lists.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Evan Martin (#3)

Re: Slow queries when functions are inlined

Evan Martin <postgresql@realityexists.net> writes:

Thanks, Tom. You mean this bit, right?
-> Seq Scan on _test_pos (cost=0.00..10728.00 rows=1 width=4)
Filter:
((('0101000020E61000000000000000001C400000000000001C40'::geography &&
_st_expand(pos, 300000::double precision)) AND ...

I tried to find some info on selectivity estimation functions, but only
came up with
http://www.postgresql.org/docs/9.1/static/xoper-optimization.html which
talks about operators. Is there something similar for functions? Or does
the rows estimate come from the PostGIS && operator that's used
internally by ST_DWithin? Just trying to understand this better so I
know what to ask on the PostGIS list.

Yeah, the _st_dwithin calls are probably also getting default estimates,
so this must be mostly on the head of the && calls. You could try
EXPLAIN ANALYZEs with just these individual filter conditions to see
which ones are far off.

regards, tom lane