SQL code runs slower as a stored function

Started by S Gabout 16 years ago9 messagesgeneral

sgennaria2@gmail.com

about 16 years ago

Hi all,

It's my first time posting to the list, though it's helped me through
numerous jams throughout the past year as I've been familiarizing myself
with all of the wonderful postgresisms =)

Just yesterday, I made what I thought would be a very minor and routine
modification to a stored function. The function aggregates over several
tables-worth of network packet data and generates some very simple
statistics on packet loss (packets sent vs packets received). Initially,
the function ran over an entire set of such data, but then I wanted to
change it to only aggregate only on data within a specific time window, as
necessary.

So I added 2 timestamp parameters to the function (start time and stop time)
and utilized them in an appropriate WHERE clause to limit the dataset being
aggregated, and suddenly the function takes much longer to run. For
example, running the original un-windowed function over ~2hrs worth of
packet data took a steady ~90ms. Running the new windowed version over the
same data now takes ~15s. The result sets are identical.

So, baffled, I decided to take the slow-running code out of the stored
function and run it as a straight sql query, replacing all parameter
variables with static values. To my surprise, it only took ~90ms again.
The result sets were all still identical.

Can anyone lend a guess as to what I'm running into here, or do I need to
provide more specifics to recreate the issue? It's repeatable, but it's a
fair bit of data for me to just post in here as-is. I've already discovered
a few creative workarounds (e.g. plpgsql: return query execute ...) that
make it run faster again, but at this point, I'm really more interested in
finding out what would make sql code run so much slower as a stored function
in the first place.

Thanks!
sg

Raymond O'Donnell

rod@iol.ie

about 16 years ago

In reply to: S G (#1)

Re: SQL code runs slower as a stored function

On 13/05/2010 16:59, S G wrote:

Can anyone lend a guess as to what I'm running into here, or do I need
to provide more specifics to recreate the issue? It's repeatable, but
it's a fair bit of data for me to just post in here as-is. I've already
discovered a few creative workarounds (e.g. plpgsql: return query
execute ...) that make it run faster again, but at this point, I'm
really more interested in finding out what would make sql code run so
much slower as a stored function in the first place.

It would help if you post the output from EXPLAIN ANALYZE for each case,
as this will hopefully show where the bottleneck is.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Stephen Frost

sfrost@snowman.net

about 16 years ago

In reply to: S G (#1)

Re: SQL code runs slower as a stored function

S G,

* S G (sgennaria2@gmail.com) wrote:

Can anyone lend a guess as to what I'm running into here, or do I need to
provide more specifics to recreate the issue? It's repeatable, but it's a
fair bit of data for me to just post in here as-is. I've already discovered
a few creative workarounds (e.g. plpgsql: return query execute ...) that
make it run faster again, but at this point, I'm really more interested in
finding out what would make sql code run so much slower as a stored function
in the first place.

The planner doesn't know what values those variables can take when
you're passing them into a function. Therefore, it tries to come up
with a plan which will work in the 'general' case, and then it will
store that plan and reuse it. When there are static values, it can
construct a better plan. If you're using partitioning on the table,
that can mean the difference between a plan that has to scan all parts
of the table and a plan that only has to scan the one part of the table
that matches the constraint. Using 'execute' will cause the planner to
re-plan the query every time using the static values that you've put
into the query.

Thanks,

Stephen

S G

sgennaria2@gmail.com

about 16 years ago

In reply to: Stephen Frost (#3)

Re: SQL code runs slower as a stored function

Thanks Stephen, that makes a lot of sense based on some research I was doing
yesterday.

I'm still a bit green to understand the inner workings of the planner and
all the stuff coming out of EXPLAIN, but I'm definitely trying to keep it
all in mind as I progress at this stuff.

I guess to really get down to the issue, I'm curious if what I'm doing is
considered 'standard procedure' to others-- i.e. using funny workarounds
like building the query in a text var and executing it with plpgsql's RETURN
QUERY EXECUTE command. Are there other schools of thought on how to
approach something like this? Could it be more of a sign that my design is
flawed? Something in me just feels like there should be a better way to
approach this. Otherwise I feel like I'm just blaming postgres for the
problem, which I'm not so sure I want to do.

Also re: Raymond's request, I tried humoring myself with the EXPLAIN output,
and I actually don't see anything useful-looking at all when I run it on a
stored function like I'm using. Is there any way to utilize EXPLAIN on a
query embedded in a stored function? I could run it just fine on the raw
sql, but the raw sql wasn't what was running slow, so I'm not sure if it's
even helpful to do that.

Thanks!

On Thu, May 13, 2010 at 12:06 PM, Stephen Frost <sfrost@snowman.net> wrote:

Show quoted text

S G,

* S G (sgennaria2@gmail.com) wrote:

Can anyone lend a guess as to what I'm running into here, or do I need to
provide more specifics to recreate the issue? It's repeatable, but it's

a

fair bit of data for me to just post in here as-is. I've already

discovered

a few creative workarounds (e.g. plpgsql: return query execute ...) that
make it run faster again, but at this point, I'm really more interested

in

finding out what would make sql code run so much slower as a stored

function

in the first place.

The planner doesn't know what values those variables can take when
you're passing them into a function. Therefore, it tries to come up
with a plan which will work in the 'general' case, and then it will
store that plan and reuse it. When there are static values, it can
construct a better plan. If you're using partitioning on the table,
that can mean the difference between a plan that has to scan all parts
of the table and a plan that only has to scan the one part of the table
that matches the constraint. Using 'execute' will cause the planner to
re-plan the query every time using the static values that you've put
into the query.

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkvsI5UACgkQrzgMPqB3kiilsQCeMsPGf4uTCLV5uhjfQBdhewCr
XHEAnA99iWd71pUCWv4A0cVPZ2NMvdp6
=YNPE
-----END PGP SIGNATURE-----

Greg Sabino Mullane

greg@turnstep.com

about 16 years ago

In reply to: S G (#4)

Re: SQL code runs slower as a stored function

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Also re: Raymond's request, I tried humoring myself with the EXPLAIN output,
and I actually don't see anything useful-looking at all when I run it on a
stored function like I'm using. Is there any way to utilize EXPLAIN on a
query embedded in a stored function? I could run it just fine on the raw
sql, but the raw sql wasn't what was running slow, so I'm not sure if it's
even helpful to do that.

I wrote a blog article about this: Google for "Why is my function slow?"

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005131251
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkvsLjQACgkQvJuQZxSWSsjgzwCfSxAeq4m6KUInlYSoBQECMASe
1psAoLJ8zrzrR4O9V0UZdOr0YHjCbxXj
=80ur
-----END PGP SIGNATURE-----

Stephen Frost

sfrost@snowman.net

about 16 years ago

In reply to: S G (#4)

Re: SQL code runs slower as a stored function

S G,

* S G (sgennaria2@gmail.com) wrote:

I guess to really get down to the issue, I'm curious if what I'm doing is
considered 'standard procedure' to others-- i.e. using funny workarounds
like building the query in a text var and executing it with plpgsql's RETURN
QUERY EXECUTE command.

It's certainly not unusual when it's necessary. We do try to minimize
the times it's necessary by making the planner as smart as possible.

Are there other schools of thought on how to
approach something like this? Could it be more of a sign that my design is
flawed? Something in me just feels like there should be a better way to
approach this. Otherwise I feel like I'm just blaming postgres for the
problem, which I'm not so sure I want to do.

I don't think needing to do this would imply that the design is flawed..

Also re: Raymond's request, I tried humoring myself with the EXPLAIN output,
and I actually don't see anything useful-looking at all when I run it on a
stored function like I'm using. Is there any way to utilize EXPLAIN on a
query embedded in a stored function? I could run it just fine on the raw
sql, but the raw sql wasn't what was running slow, so I'm not sure if it's
even helpful to do that.

You can try putting the explain *into* the stored procedure, and then
putting its results into a text variable and then spitting that back
out. You can also play with setting it up as a prepared statement
and then running explain/explain analyze on that; eg:

prepare x as select * from table where column1 = $1;
explain execute x('2010-01-01');

That would hopefully give you the same plan as what's happening in the
stored procedure.

Thanks,

Stephen

S G

sgennaria2@gmail.com

about 16 years ago

In reply to: Greg Sabino Mullane (#5)

Re: SQL code runs slower as a stored function

I wrote a blog article about this: Google for "Why is my function slow?"<http://www.postgresql.org/mailpref/pgsql-general>

Greg, great article. It's funny and humbling how I thought I'd invented the
wheel with this workaround, but now it suddenly appears to be common
practice. I guess I'm on the right track.

I'm curious though-- your examples were plpgsql and you mentioned at the
bottom that the plpgsql documentation mentions the one-time plan
limitation. Does the same hold true for stored functions in sql? My slow
function in this case is written in sql, so I'm guessing the answer is
'yes'.

S G

sgennaria2@gmail.com

about 16 years ago

In reply to: Raymond O'Donnell (#2)

Re: SQL code runs slower as a stored function

It would help if you post the output from EXPLAIN ANALYZE for each case,
as this will hopefully show where the bottleneck is.

I'm glad you asked, and I think I've discussed enough about what *could-be*
happening that I'd like to get my hands dirty with what's really going on.
Maybe now I can learn how to read and interpret all this stuff. I will
present 3 EXPLAIN ANALYZE cases:
1) slow with variables: equal query structure as example 2
2) fast with static values: equal query structure as example 1
3) fast with variables: I changed where I put the constraint and it allowed
me to keep this as a normal sql stored procedure without relying on the
plpgsql RETURN QUERY EXECUTE fix

1) slow with variables: equal query structure as example 2

'CTE Scan on subquery2 (cost=4966.56..4966.59 rows=1 width=88) (actual
time=5616.201..5616.202 rows=1 loops=1)'
' CTE subquery2'
' -> HashAggregate (cost=4966.54..4966.56 rows=1 width=44) (actual
time=5616.178..5616.179 rows=1 loops=1)'
' CTE subquery'
' -> Sort (cost=4966.51..4966.51 rows=1 width=346) (actual
time=5612.507..5612.726 rows=1810 loops=1)'
' Sort Key: tb_blackhawk_sent.sent_date_time'
' Sort Method: quicksort Memory: 870kB'
' -> Nested Loop Left Join (cost=813.78..4966.50 rows=1
width=346) (actual time=10.012..5608.011 rows=1810 loops=1)'
' Join Filter: (tb_blackhawk_sent.pcap_sent_date_time
= tb_blackhawk.pcap_sent_date_time)'
' -> Nested Loop (cost=813.78..3367.26 rows=1
width=229) (actual time=4.698..24.509 rows=1810 loops=1)'
' Join Filter:
((tb_blackhawk_sent.pcap_sent_date_time >=
sf_group_join_times_for_collector_in_test.join_time) AND
(tb_blackhawk_sent.pcap_sent_date_time <=
sf_group_join_times_for_collector_in_test.stop_time))'
' -> Function Scan on
sf_group_join_times_for_collector_in_test (cost=0.00..265.00 rows=1
width=92) (actual time=2.946..2.957 rows=2 loops=1)'
' Filter: (((group_ip)::text = $3) AND
(listen_port = $4))'
' -> Bitmap Heap Scan on tb_blackhawk_sent
(cost=813.78..3102.11 rows=10 width=137) (actual time=1.661..9.843 rows=1862
loops=2)'
' Recheck Cond:
(tb_blackhawk_sent.test_id = $1)'
' Filter:
((tb_blackhawk_sent.pcap_sent_date_time >= $5) AND
(tb_blackhawk_sent.pcap_sent_date_time <= $6) AND
((tb_blackhawk_sent.destination_ip)::text = $3) AND
(tb_blackhawk_sent.destination_port = $4))'
' -> Bitmap Index Scan on
tb_blackhawk_sent_pk2 (cost=0.00..813.77 rows=13526 width=0) (actual
time=1.625..1.625 rows=13031 loops=2)'
' Index Cond:
(tb_blackhawk_sent.test_id = $1)'
' -> Index Scan using tb_blackhawk_pk2 on
tb_blackhawk (cost=0.00..1599.22 rows=1 width=117) (actual
time=0.678..3.083 rows=1 loops=1810)'
' Index Cond: ((tb_blackhawk.test_id = $1) AND
(tb_blackhawk_sent.sequence_num = tb_blackhawk.sequence_num) AND
((tb_blackhawk.collector_ip)::text = ($2)::text) AND
((tb_blackhawk_sent.collector_ip)::text = (tb_blackhawk.source_ip)::text))'
' Filter: (((tb_blackhawk.destination_ip)::text
= $3) AND (tb_blackhawk.destination_port = $4))'
' -> CTE Scan on subquery (cost=0.00..0.02 rows=1 width=44)
(actual time=5612.513..5615.248 rows=1810 loops=1)'
'Total runtime: 5616.395 ms'

2) fast with static values: equal query structure as example 1

'CTE Scan on subquery2 (cost=14396.95..14403.95 rows=200 width=88) (actual
time=46.243..46.244 rows=1 loops=1)'
' CTE subquery2'
' -> HashAggregate (cost=14393.95..14396.95 rows=200 width=44) (actual
time=46.222..46.223 rows=1 loops=1)'
' CTE subquery'
' -> Sort (cost=14353.35..14356.25 rows=1160 width=346) (actual
time=42.338..42.532 rows=1810 loops=1)'
' Sort Key: tb_blackhawk_sent.sent_date_time'
' Sort Method: quicksort Memory: 870kB'
' -> Hash Left Join (cost=11315.16..14294.31 rows=1160
width=346) (actual time=19.476..38.382 rows=1810 loops=1)'
' Hash Cond: (((tb_blackhawk_sent.collector_ip)::text
= (tb_blackhawk.source_ip)::text) AND (tb_blackhawk_sent.sequence_num =
tb_blackhawk.sequence_num) AND (tb_blackhawk_sent.pcap_sent_date_time =
tb_blackhawk.pcap_sent_date_time))'
' -> Nested Loop (cost=789.32..3490.06 rows=1160
width=229) (actual time=4.954..21.640 rows=1810 loops=1)'
' Join Filter:
((tb_blackhawk_sent.pcap_sent_date_time >=
sf_group_join_times_for_collector_in_test.join_time) AND
(tb_blackhawk_sent.pcap_sent_date_time <=
sf_group_join_times_for_collector_in_test.stop_time))'
' -> Function Scan on
sf_group_join_times_for_collector_in_test (cost=0.00..265.00 rows=1
width=92) (actual time=2.992..2.998 rows=2 loops=1)'
' Filter: (((group_ip)::text =
'239.0.0.1'::text) AND (listen_port = 5001))'
' -> Bitmap Heap Scan on tb_blackhawk_sent
(cost=789.32..3068.48 rows=10439 width=137) (actual time=1.762..8.630
rows=1862 loops=2)'
' Recheck Cond:
(tb_blackhawk_sent.test_id = 219)'
' Filter:
((tb_blackhawk_sent.pcap_sent_date_time >= '-infinity'::timestamp without
time zone) AND (tb_blackhawk_sent.pcap_sent_date_time <=
'infinity'::timestamp without time zone) AND
((tb_blackhawk_sent.destination_ip)::text = '239.0.0.1'::text) AND
(tb_blackhawk_sent.destination_port = 5001))'
' -> Bitmap Index Scan on
tb_blackhawk_sent_pk2 (cost=0.00..786.71 rows=13118 width=0) (actual
time=1.730..1.730 rows=13031 loops=2)'
' Index Cond:
(tb_blackhawk_sent.test_id = 219)'
' -> Hash (cost=10451.04..10451.04 rows=4274
width=117) (actual time=14.508..14.508 rows=1361 loops=1)'
' -> Bitmap Heap Scan on tb_blackhawk
(cost=3179.36..10451.04 rows=4274 width=117) (actual time=9.792..13.853
rows=1361 loops=1)'
' Recheck Cond: ((test_id = 219) AND
((collector_ip)::text = '192.168.145.2'::text))'
' Filter: (((destination_ip)::text =
'239.0.0.1'::text) AND (destination_port = 5001))'
' -> Bitmap Index Scan on
tb_blackhawk_pk2 (cost=0.00..3178.29 rows=5725 width=0) (actual
time=9.753..9.753 rows=9437 loops=1)'
' Index Cond: ((test_id = 219) AND
((collector_ip)::text = '192.168.145.2'::text))'
' -> CTE Scan on subquery (cost=0.00..23.20 rows=1160 width=44)
(actual time=42.343..45.235 rows=1810 loops=1)'
'Total runtime: 46.436 ms'

3) fast with variables: I changed where I put the constraint and it allowed
me to keep this as a normal sql stored procedure without relying on the
plpgsql RETURN QUERY EXECUTE fix

'CTE Scan on subquery2 (cost=11879.96..11880.00 rows=1 width=88) (actual
time=39.888..39.889 rows=1 loops=1)'
' CTE subquery2'
' -> HashAggregate (cost=11879.95..11879.96 rows=1 width=44) (actual
time=39.867..39.868 rows=1 loops=1)'
' CTE subquery'
' -> Sort (cost=11874.02..11874.56 rows=215 width=346) (actual
time=35.562..35.729 rows=1810 loops=1)'
' Sort Key: tb_blackhawk_sent.sent_date_time'
' Sort Method: quicksort Memory: 870kB'
' -> Hash Left Join (cost=9309.88..11865.69 rows=215
width=346) (actual time=18.192..31.508 rows=1810 loops=1)'
' Hash Cond: (((tb_blackhawk_sent.collector_ip)::text
= (tb_blackhawk.source_ip)::text) AND (tb_blackhawk_sent.sequence_num =
tb_blackhawk.sequence_num) AND (tb_blackhawk_sent.pcap_sent_date_time =
tb_blackhawk.pcap_sent_date_time))'
' -> Nested Loop (cost=814.26..3328.94 rows=215
width=229) (actual time=4.538..15.664 rows=1810 loops=1)'
' Join Filter:
((tb_blackhawk_sent.pcap_sent_date_time >=
sf_group_join_times_for_collector_in_test.join_time) AND
(tb_blackhawk_sent.pcap_sent_date_time <=
sf_group_join_times_for_collector_in_test.stop_time))'
' -> Function Scan on
sf_group_join_times_for_collector_in_test (cost=0.00..265.00 rows=1
width=92) (actual time=2.958..2.963 rows=2 loops=1)'
' Filter: (((group_ip)::text = $3) AND
(listen_port = $4))'
' -> Bitmap Heap Scan on tb_blackhawk_sent
(cost=814.26..3034.96 rows=1932 width=137) (actual time=1.573..5.569
rows=1862 loops=2)'
' Recheck Cond:
(tb_blackhawk_sent.test_id = $1)'
' Filter:
(((tb_blackhawk_sent.destination_ip)::text = $3) AND
(tb_blackhawk_sent.destination_port = $4))'
' -> Bitmap Index Scan on
tb_blackhawk_sent_pk2 (cost=0.00..813.77 rows=13526 width=0) (actual
time=1.541..1.541 rows=13031 loops=2)'
' Index Cond:
(tb_blackhawk_sent.test_id = $1)'
' -> Hash (cost=8485.14..8485.14 rows=599
width=117) (actual time=13.640..13.640 rows=1361 loops=1)'
' -> Bitmap Heap Scan on tb_blackhawk
(cost=1924.02..8485.14 rows=599 width=117) (actual time=9.240..13.035
rows=1361 loops=1)'
' Recheck Cond: ((test_id = $1) AND
((collector_ip)::text = ($2)::text))'
' Filter: (((destination_ip)::text = $3)
AND (destination_port = $4))'
' -> Bitmap Index Scan on
tb_blackhawk_pk2 (cost=0.00..1923.87 rows=4193 width=0) (actual
time=9.201..9.201 rows=9437 loops=1)'
' Index Cond: ((test_id = $1) AND
((collector_ip)::text = ($2)::text))'
' -> CTE Scan on subquery (cost=0.00..5.38 rows=1 width=44)
(actual time=35.571..38.811 rows=1810 loops=1)'
' Filter: ((sent_date_time_a >= $5) AND (sent_date_time_a <=
$6))'
'Total runtime: 40.092 ms'

Hope you can read that better than I can!

Thanks!

Tom Lane

tgl@sss.pgh.pa.us

about 16 years ago

In reply to: Stephen Frost (#6)

Re: SQL code runs slower as a stored function

Stephen Frost <sfrost@snowman.net> writes:

* S G (sgennaria2@gmail.com) wrote:

I guess to really get down to the issue, I'm curious if what I'm doing is
considered 'standard procedure' to others-- i.e. using funny workarounds
like building the query in a text var and executing it with plpgsql's RETURN
QUERY EXECUTE command.

It's certainly not unusual when it's necessary. We do try to minimize
the times it's necessary by making the planner as smart as possible.

Yeah. This is a known sore spot that I hope to do something about in
the 9.1 development cycle.

regards, tom lane