Performance issues with v18 SQL-language-function changes
In the thread that led up to commit 0dca5d68d [1]/messages/by-id/8216639.NyiUUSuA9g@aivenlaptop, we'd convinced
ourselves that the new implementation was faster than the old.
So I was sad to discover that there are common cases where it's
a good bit slower. We'd focused too much on test methods like
do $$
begin
for i in 1..10000000 loop
perform fx((random()*100)::int);
end loop;
end;
$$;
The thing about this test case is that the SQL function under
test is executed only once per calling query (i.e., per PERFORM).
That doesn't allow the old implementation to amortize anything.
If you instead test cases like
create function fx(p_summa bigint) returns text immutable strict
return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));
explain analyze select fx(i) from generate_series(1,1000000) as i(i);
you arrive at the rude discovery that 0dca5d68d is about 50% slower
than 0dca5d68d^, because the old implementation builds a plan for fx()
only once and then re-uses it throughout the query. So does the new
implementation, but it has added GetCachedPlan overhead. Moreover,
I made the unforced error of deciding that we could tear down and
rebuild the SQLFunctionCache and subsidiary data structures for
each call. That overhead is relatively minor in comparison to the
cost of parsing and planning the function; but when comparing cases
where there's no repeated parsing and planning, it becomes
significant.
Hence, the attached patch series reverts that decision and goes back
to the old method of having the SQLFunctionCache and some associated
objects live as long as the FmgrInfo does (without, however, the
poorly-controlled memory leaks of the old code). In my testing
this gets us from a 50% penalty down to about 5%, which I think is
acceptable given the other benefits 0dca5d68d brought us.
I'm inclined to argue that, seeing that 0dca5d68d was mainly intended
as a performance feature, this performance loss is a bug that we
need to do something about even though we're post-feature-freeze.
We could either revert 0dca5d68d or apply the attached. I'd prefer
the latter.
(There are other things we could do to try to reduce the overhead
further, such as trying to not build a Tuplestore or JunkFilter in
simple cases. But that seems like new work not a fix for a bad
decision in existing work, so I think it's out of scope for v18.)
Comments?
regards, tom lane
Attachments:
v1-0001-Minor-performance-improvement-for-SQL-language-fu.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Minor-performance-improvement-for-SQL-language-fu.p; name*1=atchDownload+11-9
v1-0002-Make-functions.c-mostly-run-in-a-short-lived-memo.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Make-functions.c-mostly-run-in-a-short-lived-memo.p; name*1=atchDownload+95-59
v1-0003-Split-some-storage-out-to-separate-subcontexts-of.patchtext/x-diff; charset=us-ascii; name*0=v1-0003-Split-some-storage-out-to-separate-subcontexts-of.p; name*1=atchDownload+67-16
v1-0004-Make-SQLFunctionCache-long-lived-again.patchtext/x-diff; charset=us-ascii; name=v1-0004-Make-SQLFunctionCache-long-lived-again.patchDownload+105-156
v1-0005-Cache-typlens-of-a-SQL-function-s-input-arguments.patchtext/x-diff; charset=us-ascii; name*0=v1-0005-Cache-typlens-of-a-SQL-function-s-input-arguments.p; name*1=atchDownload+12-2
On Sun, Apr 13, 2025 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
create function fx(p_summa bigint) returns text immutable strict
return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));explain analyze select fx(i) from generate_series(1,1000000) as i(i);
you arrive at the rude discovery that 0dca5d68d is about 50% slower
than 0dca5d68d^, because the old implementation builds a plan for fx()
only once and then re-uses it throughout the query.
I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi
po 14. 4. 2025 v 16:38 odesílatel Robert Haas <robertmhaas@gmail.com>
napsal:
On Sun, Apr 13, 2025 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
create function fx(p_summa bigint) returns text immutable strict
return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));explain analyze select fx(i) from generate_series(1,1000000) as i(i);
you arrive at the rude discovery that 0dca5d68d is about 50% slower
than 0dca5d68d^, because the old implementation builds a plan for fx()
only once and then re-uses it throughout the query.I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.
I can confirm that all tests passed, and patched code is about 5% faster
than the current master (tested on my slower notebook). So it should to fix
performance regression where it was it against pg17 (it was about 2%)
(tested without assertions)
Regards
Pavel
Show quoted text
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Apr 14, 2025 at 10:38:29AM -0400, Robert Haas wrote:
On Sun, Apr 13, 2025 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
create function fx(p_summa bigint) returns text immutable strict
return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));explain analyze select fx(i) from generate_series(1,1000000) as i(i);
you arrive at the rude discovery that 0dca5d68d is about 50% slower
than 0dca5d68d^, because the old implementation builds a plan for fx()
only once and then re-uses it throughout the query.I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.
Yep, we went down the road in PG 18 to convert syntax, and now we have
to fix this, or we have to revert all the PG 18 syntax changes, which
seems like a step backward.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
Bruce Momjian <bruce@momjian.us> writes:
On Mon, Apr 14, 2025 at 10:38:29AM -0400, Robert Haas wrote:
I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.
Yep, we went down the road in PG 18 to convert syntax, and now we have
to fix this, or we have to revert all the PG 18 syntax changes, which
seems like a step backward.
I'm confused? 0dca5d68d didn't have anything to do with
syntax changes, just with when planning happens.
regards, tom lane
Hi
st 16. 4. 2025 v 19:43 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Bruce Momjian <bruce@momjian.us> writes:
On Mon, Apr 14, 2025 at 10:38:29AM -0400, Robert Haas wrote:
I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.Yep, we went down the road in PG 18 to convert syntax, and now we have
to fix this, or we have to revert all the PG 18 syntax changes, which
seems like a step backward.I'm confused? 0dca5d68d didn't have anything to do with
syntax changes, just with when planning happens.
yes, and for most cases it has significant performance benefits. For a few
corner cases, there can be some slowdown that can be fixed by last Tom
patches.
Regards
Pavel
Show quoted text
regards, tom lane
On Wed, Apr 16, 2025 at 01:43:46PM -0400, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
On Mon, Apr 14, 2025 at 10:38:29AM -0400, Robert Haas wrote:
I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.Yep, we went down the road in PG 18 to convert syntax, and now we have
to fix this, or we have to revert all the PG 18 syntax changes, which
seems like a step backward.I'm confused? 0dca5d68d didn't have anything to do with
syntax changes, just with when planning happens.
I was referencing the contrib initialization functions you converted to
use SQL-standard function bodies:
commit 68ff25eef12
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Dec 29 14:58:05 2024 -0500
contrib/pageinspect: Use SQL-standard function bodies.
In the same spirit as 969bbd0fa, 13e3796c9, 3f323eba8.
Tom Lane and Ronan Dunklau
Discussion: /messages/by-id/3316564.aeNJFYEL58@aivenlaptop
I thought that was what you were saying were now slower; maybe I was
confused.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
Bruce Momjian <bruce@momjian.us> writes:
On Wed, Apr 16, 2025 at 01:43:46PM -0400, Tom Lane wrote:
I'm confused? 0dca5d68d didn't have anything to do with
syntax changes, just with when planning happens.
I was referencing the contrib initialization functions you converted to
use SQL-standard function bodies:
Nah, that's not really relevant. The speed concerns I have here
are mostly independent of whether the SQL function is written
in string or SQL-standard form. Also, I think all of those
contrib functions that are at all performance-relevant are
capable of being inlined, and so wouldn't reach this code anyway.
regards, tom lane
Hello Tom,
Sorry if I've chosen the wrong thread, but starting from 0313c5dc6,
the following script:
CREATE TYPE aggtype AS (a int, b text);
CREATE FUNCTION aggfns_trans(aggtype[], integer, text) RETURNS aggtype[] LANGUAGE sql AS 'SELECT
array_append($1,ROW($2,$3)::aggtype)';
CREATE AGGREGATE aggfns(integer, text) (SFUNC = public.aggfns_trans, STYPE = public.aggtype[], INITCOND = '{}');
CREATE TABLE t(i int, k int);
INSERT INTO t SELECT 1, 2 FROM generate_series(1, 4000);
SET statement_timeout = '10s';
SELECT aggfns(i, repeat('x', 8192)) OVER (PARTITION BY i) FROM t;
crashes the server for me like this:
corrupted size vs. prev_size while consolidating
2025-04-30 19:40:04.209 UTC [286426] LOG: client backend (PID 286441) was terminated by signal 6: Aborted
2025-04-30 19:40:04.209 UTC [286426] DETAIL: Failed process was running: SELECT aggfns(i, repeat('x', 8192)) OVER
(PARTITION BY i) FROM t;
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x000073cc15c4526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x000073cc15c288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x000073cc15c297b6 in __libc_message_impl (fmt=fmt@entry=0x73cc15dce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:132
#6 0x000073cc15ca8fe5 in malloc_printerr (str=str@entry=0x73cc15dd1b30 "corrupted size vs. prev_size while
consolidating") at ./malloc/malloc.c:5772
#7 0x000073cc15cab144 in _int_free_merge_chunk (av=0x73cc15e03ac0 <main_arena>, p=0x5cb3ac57b2c0, size=12541904) at
./malloc/malloc.c:4695
#8 0x000073cc15cadd9e in __GI___libc_free (mem=mem@entry=0x5cb3acd73290) at ./malloc/malloc.c:3398
#9 0x00005cb3a0c2db4c in AllocSetFree (pointer=0x5cb3acd732c8) at aset.c:1107
#10 0x00005cb3a0c381f8 in pfree (pointer=<optimized out>) at mcxt.c:241
#11 0x00005cb3a067de98 in heap_free_minimal_tuple (mtup=<optimized out>) at heaptuple.c:1532
#12 0x00005cb3a08b86a1 in tts_minimal_clear (slot=0x5cb3ac576fb0) at execTuples.c:532
#13 0x00005cb3a08ab16e in ExecClearTuple (slot=0x5cb3ac576fb0) at ../../../src/include/executor/tuptable.h:460
#14 ExecFilterJunk (junkfilter=<optimized out>, slot=<optimized out>) at execJunk.c:277
#15 0x00005cb3a08bdb03 in sqlfunction_receive (slot=<optimized out>, self=0x5cb3ac525ce0) at functions.c:2597
#16 0x00005cb3a08ab4e7 in ExecutePlan (dest=0x5cb3ac525ce0, direction=<optimized out>, numberTuples=1, sendTuples=true,
operation=CMD_SELECT,
queryDesc=0x5cb3ac525d30) at execMain.c:1814
...
With some script modifications, I observed also other memory-context-
related crashes.
(Probably this effect is achieved just because of the performance
optimization -- I haven't look deeper yet.)
This issue is discovered with SQLsmith.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Alexander Lakhin <exclusion@gmail.com> writes:
Sorry if I've chosen the wrong thread, but starting from 0313c5dc6,
the following script:
...
crashes the server for me like this:
corrupted size vs. prev_size while consolidating
Yup, duplicated here. Thanks for the report!
regards, tom lane
Alexander Lakhin <exclusion@gmail.com> writes:
Sorry if I've chosen the wrong thread, but starting from 0313c5dc6,
the following script:
CREATE TYPE aggtype AS (a int, b text);
CREATE FUNCTION aggfns_trans(aggtype[], integer, text) RETURNS aggtype[] LANGUAGE sql AS 'SELECT
array_append($1,ROW($2,$3)::aggtype)';
CREATE AGGREGATE aggfns(integer, text) (SFUNC = public.aggfns_trans, STYPE = public.aggtype[], INITCOND = '{}');
CREATE TABLE t(i int, k int);
INSERT INTO t SELECT 1, 2 FROM generate_series(1, 4000);
SET statement_timeout = '10s';
SELECT aggfns(i, repeat('x', 8192)) OVER (PARTITION BY i) FROM t;
crashes the server for me like this:
corrupted size vs. prev_size while consolidating
Hmm. What seems to be going on here is that once the aggfns_trans()
result gets large enough that the SQL-function-result tuplestore
decides to spill to disk, when we pull the result tuple back out
of the tuplestore with tuplestore_gettupleslot we end up with the
jf_resultSlot holding a should-free tuple pointer that points into
the tuplestore's storage. After tuplestore_clear that is a dangling
pointer, and the next use of the jf_resultSlot fails while trying to
free the tuple.
So the attached fixes it for me, but I'm still mighty confused
because I don't understand why it didn't fail in the old code.
This logic doesn't seem noticeably different from before, and
there's even a very old comment (in the SRF path) alleging that
/* NB: this might delete the slot's content, but we don't care */
Now we *do* care, but what changed?
(As an aside, seems like tuplestore is leaving money on the table,
because there's hardly any point in spilling to disk when it never
holds more than one tuple. But that's not something to change now.)
regards, tom lane
Attachments:
fix-dangling-tupleslot-pointer.patchtext/x-diff; charset=us-ascii; name=fix-dangling-tupleslot-pointer.patchDownload+6-1
I wrote:
Hmm. What seems to be going on here is that once the aggfns_trans()
result gets large enough that the SQL-function-result tuplestore
decides to spill to disk, when we pull the result tuple back out
of the tuplestore with tuplestore_gettupleslot we end up with the
jf_resultSlot holding a should-free tuple pointer that points into
the tuplestore's storage. After tuplestore_clear that is a dangling
pointer, and the next use of the jf_resultSlot fails while trying to
free the tuple.
I still haven't figured out why this wasn't a problem in the old
version of functions.c. However, I did realize that my pending
patch at [1]/messages/by-id/2443532.1744919968@sss.pgh.pa.us gets rid of the problem in another way, by removing
functions.c's use of tuplestore_gettupleslot altogether. Now
I'm tempted to just push that, instead of applying a band-aid that
will leave v18 doing this differently from both earlier and later
branches.
regards, tom lane
I wrote:
I still haven't figured out why this wasn't a problem in the old
version of functions.c.
Oh, got it: I was misremembering the API contract for
tuplestore_gettupleslot. There are two possible ways for it to
load the slot with a tuple:
* tuple points into the tuplestore's memory, and the slot's
should_free flag is false. In this case, tuplestore_clear
leaves the slot pointing to garbage, but it doesn't matter
because we won't try to access nor free the garbage. (This
is the case that comment was talking about.)
* tuple points to a tuple palloc'd in *the caller's memory
context* --- not the tuplestore's memory as I was thinking.
In this case the slot's should_free flag is true.
Thus the bug arises because the caller's memory context is now the
relatively short-lived per-tuple context that fmgr_sql is called in,
where in prior versions it was the SQL function's long-lived fcontext.
The slot we're putting it in is also long-lived. So what happens
is that on the next fmgr_sql call, after the per-tuple context has
been reset and thus discarded the palloc'd tuple, the slot still
remembers that it's supposed to free the tuple and does so. Kaboom.
But in released branches, the slot and the palloc'd tuple are in the
same context and there's no such hazard.
The quick-hack patch I posted upthread fixes this by ensuring we
clear the slot (and thus let it pfree the tuple) before context
reset of the per-tuple context wipes the tuple from underneath it.
I'm still inclined to fix this by using the patch from [1]/messages/by-id/2443532.1744919968@sss.pgh.pa.us though.
Aside from the argument that it'd be better if v18 were like later
branches here, I now realize that there's at least one false statement
in my argument at [1]/messages/by-id/2443532.1744919968@sss.pgh.pa.us:
Given the lack of field complaints over the many years it's been
like this, there doesn't seem to be a live problem. I think the
explanation for that is
(1) those mechanisms are never used to call set-returning functions,
(2) therefore, the tuplestore will not be called on to hold more
than one result row,
(3) therefore, it won't get large enough that it wants to spill
to disk,
(4) therefore, its potentially dangling resowner pointer is never
used.
However, this is an uncomfortably shaky chain of assumptions.
This example has shown that tuplestore.c is capable of spilling to
disk even if it is storing only a single tuple, which means (3) is
wrong, which means the tuplestore *can* touch its resowner pointer
if the function result is wide enough. I wonder if this means we
have a reachable bug in released branches.
regards, tom lane