slow performance of array_agg after upgrade from 9.2 to 9.5

Started by jaroetover 9 years ago3 messagesgeneral
Jump to latest
#1jaroet
jaroet@gmail.com

Internally we upgraded from 9.2 to 9.5 en we had defined an median function.
This became about 7 to 8 times slower using the same functions.

They are defined like this:

CREATE OR REPLACE FUNCTION public._final_median(anyarray)
RETURNS double precision
LANGUAGE sql
AS
$body$
WITH q AS
(
SELECT val
FROM unnest($1) val
WHERE VAL IS NOT NULL
ORDER BY 1
),
cnt AS
(
SELECT COUNT(*) AS c FROM q
)
SELECT AVG(val)::float8
FROM
(
SELECT val FROM q
LIMIT 2 - MOD((SELECT c FROM cnt), 2)
OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0)
) q2;
$body$
IMMUTABLE
COST 100;

CREATE AGGREGATE median(anyelement)
(
sfunc = array_append,
stype = anyarray,
finalfunc = _final_median,
initcond = '{}'
);

All SQL still work but a lot slower now. Our tables on which we use this
function are between 5.000 and 150.000 rows with between 18 and 800 columns.

We found that the median function that fills an array is the slow part. When
we change our SQL from median(fieldname) to
_final_median(array_agg(fieldname)) the performance is even 3 times faster
than on 9.2.

So it looks like the array_agg function when used in a self-defined function
is extremly slow.

As we have a lot of files in our ETL proces where a lot of median functions
are used we tried to fix this issue instead of altering the median SQL as
mentioned above. But we are not yet succeeding.

Anybody had this issue and knows about a way to solve this gracefully?

Regards,
jaroet

--
View this message in context: http://postgresql.nabble.com/slow-performance-of-array-agg-after-upgrade-from-9-2-to-9-5-tp5927751.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: jaroet (#1)
Re: slow performance of array_agg after upgrade from 9.2 to 9.5

2016-10-26 15:06 GMT+02:00 jaroet <jaroet@gmail.com>:

Internally we upgraded from 9.2 to 9.5 en we had defined an median
function.
This became about 7 to 8 times slower using the same functions.

They are defined like this:

CREATE OR REPLACE FUNCTION public._final_median(anyarray)
RETURNS double precision
LANGUAGE sql
AS
$body$
WITH q AS
(
SELECT val
FROM unnest($1) val
WHERE VAL IS NOT NULL
ORDER BY 1
),
cnt AS
(
SELECT COUNT(*) AS c FROM q
)
SELECT AVG(val)::float8
FROM
(
SELECT val FROM q
LIMIT 2 - MOD((SELECT c FROM cnt), 2)
OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0)
) q2;
$body$
IMMUTABLE
COST 100;

CREATE AGGREGATE median(anyelement)
(
sfunc = array_append,
stype = anyarray,
finalfunc = _final_median,
initcond = '{}'
);

All SQL still work but a lot slower now. Our tables on which we use this
function are between 5.000 and 150.000 rows with between 18 and 800
columns.

We found that the median function that fills an array is the slow part.
When
we change our SQL from median(fieldname) to
_final_median(array_agg(fieldname)) the performance is even 3 times faster
than on 9.2.

So it looks like the array_agg function when used in a self-defined
function
is extremly slow.

As we have a lot of files in our ETL proces where a lot of median functions
are used we tried to fix this issue instead of altering the median SQL as
mentioned above. But we are not yet succeeding.

This is pretty strange - please, can you send test case?

can you try to rewrite your SQL functions to plpgsql? There can be some
changes with inlining of SQL functions.

Regards

Pavel

Anybody had this issue and knows about a way to solve this gracefully?

you can try to use buildin function percentile
https://www.postgresql.org/docs/9.5/static/functions-aggregate.html

Regards

Pavel

Show quoted text

Regards,
jaroet

--
View this message in context: http://postgresql.nabble.com/
slow-performance-of-array-agg-after-upgrade-from-9-2-to-9-5-tp5927751.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: jaroet (#1)
Re: slow performance of array_agg after upgrade from 9.2 to 9.5

jaroet <jaroet@gmail.com> writes:

Internally we upgraded from 9.2 to 9.5 en we had defined an median function.
This became about 7 to 8 times slower using the same functions.
...
We found that the median function that fills an array is the slow part. When
we change our SQL from median(fieldname) to
_final_median(array_agg(fieldname)) the performance is even 3 times faster
than on 9.2.
So it looks like the array_agg function when used in a self-defined function
is extremly slow.

But you're not using array_agg, you're using array_append. That isn't
what array_agg is built on. Unfortunately, what array_agg is built on
is something that isn't very convenient to use for custom aggregates,
because it relies on an "internal"-type transition value.

I don't have any magic fix for getting this back to the previous level
of performance, but have started a thread about it on -hackers:
/messages/by-id/6315.1477677885@sss.pgh.pa.us

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general