14.1 immutable function, bad performance if check number = 'NaN'

Started by Federico Travagliniabout 4 years ago7 messagesbugs

Jump to latest

Federico Travaglini

federico.travaglini@aubay.it

about 4 years ago

Good evening, and thanks to your excellent Postgres.

This funcion in used as a column in a select on about 400k records

If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds!
Such behaviour looks not to be reasonable.

By the way, in my case I can remove that line, because the function
behaviour is the same, but I wanted to provide my very little contribution.

Bye

Federico

*CREATE* *OR* *REPLACE* *FUNCTION*
geo_ants.antsgeo_get_severity_thr(v_measure_value *double* *precision*,
thr_value_1 *double* *precision*, thr_value_2 *double* *precision*,
thr_value_3 *double* *precision*, thr_value_4 *double* *precision*,
thr_value_5 *double* *precision*)

*RETURNS* *text*

*LANGUAGE* *sql*

*immutable*

--IMMUTABLE PARALLEL SAFE

*AS* *$function$*

----------------------------------------------------------------------------------------------------------------------

-- Author: Federico Travaglini

-- Date: 2020

-- Description:

-- Change Hist: please mark changes in code as "yyyy-mm-dd, Author, change
request id in Merant, brief description"

----------------------------------------------------------------------------------------------------------------------

*select*

*case*

*WHEN* v_measure_value= 'NaN' *THEN* '6 Unk'::*text*

*when* thr_value_1 = thr_value_4 *then* -- colorazione
disabilitata, ad esempio per lat, long...

'6 none'::*text*

*when* thr_value_1 > thr_value_4 *then* -- valori critical >
clear

-- SIAMO NEL CASO: valori critical > clear ( thr_5 clear
thr_4 warning thr_3 minor thr_2 major thr_1 critical)

*CASE*

*WHEN* v_measure_value >= thr_value_1 *THEN* '5
Critical'::*text* --critical

*WHEN* v_measure_value < thr_value_1 *AND*
v_measure_value >= thr_value_2 *THEN* '4 Major'::*text* --major

*WHEN* v_measure_value < thr_value_2 *AND*
v_measure_value >= thr_value_3 *THEN* '3 Minor'::*text* --minor

*WHEN* v_measure_value < thr_value_3 *AND*
v_measure_value >= thr_value_4 *THEN* '2 Warning'::*text* --warning

*WHEN* v_measure_value < thr_value_4 *THEN* '1 Clear'::
*text* --clear

*ELSE* '6 Unk'::*text* -- null values

*end*

*else*

-- SIAMO NEL CASO: valori critical < clear (critical thr_1
maj thr_2 minor thr_3 war thr_4 clear thr_5)

*CASE*

*WHEN* v_measure_value < thr_value_1 *THEN* '5 Critical'
::*text* --critical

*WHEN* v_measure_value >= thr_value_1 *AND*
v_measure_value < thr_value_2 *THEN* '4 Major'::*text* --major

*WHEN* v_measure_value >= thr_value_2 *AND*
v_measure_value < thr_value_3 *THEN* '3 Minor'::*text* --minor

*WHEN* v_measure_value >= thr_value_3 *AND*
v_measure_value < thr_value_4 *THEN* '2 Warning'::*text* --warning

*WHEN* v_measure_value >= thr_value_4 *THEN* '1 Clear'::
*text* --clear

*ELSE* '6 Unk'::*text* -- null values

*end*

*end*::*text*

*$function$*

;

*Federico TRAVAGLINI*

*Project Manager*

<https://www.aubay.it/>

*AUBAY ITALIA*

Via Cesare Giulio Viola 19 (Torre C) - 00197 Roma

*Office :* (+39) 06 83780225
*Mobile :* (+39) 339 7521520

<https://www.linkedin.com/company/aubay-italy/>

<https://twitter.com/Aubay_Italia>

<https://www.facebook.com/aubayit/>

<https://www.instagram.com/aubayitalia/>

This message is confidential and solely for the intended
address(es). If
you are not the intended recipient of this message, please
notify the sender
immediately and delete it from your system. Unauthorised
reproduction,
disclosure, modification and or distribution of this e-mail
is strictly
prohibited. The contents of this e-mail do not constitute a
commitment by Aubay
S.p.A., except where expressly provided for in a
written agreement between you
and Aubay.

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Federico Travaglini (#1)

Re: 14.1 immutable function, bad performance if check number = 'NaN'

Federico Travaglini <federico.travaglini@aubay.it> writes:

This funcion in used as a column in a select on about 400k records
If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds!
Such behaviour looks not to be reasonable.

It's not at all clear which line you think is the "highlighted" one.

However, I'm guessing that this SQL function is a candidate for
inlining, so you might try comparing EXPLAIN VERBOSE output for
the query with both forms of the function. Perhaps that will
yield some insight into what's expensive.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

about 4 years ago

In reply to: Tom Lane (#2)

Re: 14.1 immutable function, bad performance if check number = 'NaN'

On Monday, April 25, 2022, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Federico Travaglini <federico.travaglini@aubay.it> writes:

This funcion in used as a column in a select on about 400k records
If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds!
Such behaviour looks not to be reasonable.

It's not at all clear which line you think is the "highlighted" one.

Its the comparison of the double input value to the untyped literal ‘NaN’
(the first case test).

David J.

Merlin Moncure

mmoncure@gmail.com

about 4 years ago

In reply to: Federico Travaglini (#1)

Re: 14.1 immutable function, bad performance if check number = 'NaN'

On Mon, Apr 25, 2022 at 11:58 AM Federico Travaglini <
federico.travaglini@aubay.it> wrote:

Good evening, and thanks to your excellent Postgres.

This funcion in used as a column in a select on about 400k records

If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds!
Such behaviour looks not to be reasonable.

lightly testing this, I got 10million iterations in about two seconds,
about the same after commenting the NaN test. Given that, problem is
probably failure to inline query. Careful examination of explain of
wrapping query should prove that.

merlin

Federico Travaglini

federico.travaglini@collaboration.aubay.it

about 4 years ago

In reply to: Merlin Moncure (#4)

R: 14.1 immutable function, bad performance if check number = 'NaN'

Good morning, thank you very much for the time you spent for my question.

Yes inlining could be the problem, because maybe does not allow to use the
IMMUTABLE feature?

The context of the query is quite complex, therefore I avoided to provide
it in previous email

Here it is what I tested. I’s a code fragment from a bigger procedure. The
strings in green are passed as parameters, as well as the thresholds
1,2,3,4,5. To test just this fragment of code I replaced them with fixed
values

*SET* random_page_cost = 0.1; (otherwise it takes more than 4 minutes in
place of 33 sec)

*EXPLAIN* (*ANALYZE*, BUFFERS, *verbose*)

*select*

* *from*

(

*select*

tms,

fh.file_id,

(e.measure_list #>> ('{' || 'cluster_comuni_italiani' ||
',s}')::*text*[]) *as* value_s_1,

(e.measure_list #> ('{' || 'cluster_comuni_italiani' ||
',n}')::*text*[])::*numeric* *as* value_n_1,

(e.measure_list #> ('{' || 'cluster_comuni_italiani' ||
',o}')::*text*[])::*numeric* *as* value_o_1,

antsgeo_get_severity_thr((e.measure_list #> ('{' ||
'cluster_comuni_italiani' || ',o}')::*text*[])::*numeric*, 1, 2, 3, 4, 5)
*AS* severity_1,

(e.measure_list #>> ('{' ||
'act_geoposition_pers_act_confidence' || ',s}')::*text*[]) *as* value_s_2,

(e.measure_list #> ('{' ||
'act_geoposition_pers_act_confidence' || ',n}')::*text*[])::*numeric* *as*
value_n_2,

(e.measure_list #> ('{' ||
'act_geoposition_pers_act_confidence' || ',o}')::*text*[])::*numeric* *as*
value_o_2,

antsgeo_get_severity_thr((e.measure_list #> ('{' ||
'act_geoposition_pers_act_confidence' || ',o}')::*text*[])::*numeric*, 1, 2,
3, 4, 5) *AS* severity_2,

(e.measure_list #>> ('{' || 'act_coverage_band_pcell' ||
',s}')::*text*[]) *as* value_s_3,

(e.measure_list #> ('{' || 'act_coverage_band_pcell' ||
',n}')::*text*[])::*numeric* *as* value_n_3,

(e.measure_list #> ('{' || 'act_coverage_band_pcell' ||
',o}')::*text*[])::*numeric* *as* value_o_3,

antsgeo_get_severity_thr((e.measure_list #> ('{' ||
'act_coverage_band_pcell' || ',o}')::*text*[])::*numeric*, 1, 2, 3, 4, 5)
*AS* severity_3,

(e.measure_list #>> ('{' || *null*::*text* ||
',s}')::*text*[])
*as* value_s_4,

(e.measure_list #> ('{' || *null*::*text* || ',n}')::*text*
[])::*numeric* *as* value_n_4,

(e.measure_list #> ('{' || *null*::*text* || ',o}')::*text*
[])::*numeric* *as* value_o_4,

antsgeo_get_severity_thr((e.measure_list #> ('{' || *null*::
*text* || ',o}')::*text*[])::*numeric*, 1, 2, 3, 4, 5) *AS* severity_4

*from*

file_hist fh,

geo_measr_sample e

*where*

(

(fh.agn_group_id = 21)

*and* fh.data_min_tms <= '2022-04-25 00:00:00' *and*
fh.data_max_tms >= '2022-02-28 00:00:00' --lo usa

)

*and* fh.act_id = e.act_id

*and* (e.tms >= '2022-02-28 00:00:00' *and* e.tms <=
'2022-04-25
00:00:00')

*and* (e.measure_list #>> ('{act_edit,s}')::*text*[] *not*
*in* ('excld') *or* e.measure_list #>> ('{act_edit,s}')::*text*[] *is*
*null*)

)t1

e.measure_list is a jsonb, with a variable structure

{

"act_plmn": {

"s": "222/1"

"struct_day": {

"s": "2022-04-22"

"struct_week": {

"s": "2022-04-18"

"act_plmn_name": {

"s": "Tim.Ita (222-01)"

"struct_act_id": {

"s": "1809464"

"struct_tc_name": {

"s": "VoiceCall_MO"

"struct_yyyy_mm": {

"s": "2022-04"

"act_coverage_ci": {

"s": "63"

"act_coverage_ta": {

"n": 4,

"o": 4

"act_environment": {

"s": "in-door"

"cell_code_pcell": {

"s": "FE23E3"

"struct_act_code": {

"s": "20220422_164238_SDTU100010.01"

"struct_act_name": {

"s": "20220422_164238_SDTU100010.01. Copy of Voice MO 0687201815"

},…

Nested Loop (cost=0.43..2055500.00 rows=1441783 width=524) (actual
time=0.761..33647.744 rows=415401 loops=1)

Output: e.tms, fh.file_id, (e.measure_list #>>
('{cluster_comuni_italiani,s}'::cstring)::text[]), ((e.measure_list #>
('{cluster_comuni_italiani,n}'::cstring)::text[]))::numeric,
((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric, CASE WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '4'::double precision) THEN '1 Clear'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '3'::double precision) THEN '2 Warning'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '2'::double precision) THEN '3 Minor'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '1'::double precision) THEN '4 Major'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision < '1'::double precision) THEN '5 Critical'::text ELSE '6
Unk'::text END, (e.measure_list #>>
('{act_geoposition_pers_act_confidence,s}'::cstring)::text[]),
((e.measure_list #>
('{act_geoposition_pers_act_confidence,n}'::cstring)::text[]))::numeric,
((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric,
CASE WHEN ((((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double
precision >= '4'::double precision) THEN '1 Clear'::text WHEN
((((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double
precision >= '3'::double precision) THEN '2 Warning'::text WHEN
((((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double
precision >= '2'::double precision) THEN '3 Minor'::text WHEN
((((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double
precision >= '1'::double precision) THEN '4 Major'::text WHEN
((((e.measure_list #>
('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double
precision < '1'::double precision) THEN '5 Critical'::text ELSE '6
Unk'::text END, (e.measure_list #>>
('{act_coverage_band_pcell,s}'::cstring)::text[]), ((e.measure_list #>
('{act_coverage_band_pcell,n}'::cstring)::text[]))::numeric,
((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric, CASE WHEN
((((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double
precision >= '4'::double precision) THEN '1 Clear'::text WHEN
((((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double
precision >= '3'::double precision) THEN '2 Warning'::text WHEN
((((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double
precision >= '2'::double precision) THEN '3 Minor'::text WHEN
((((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double
precision >= '1'::double precision) THEN '4 Major'::text WHEN
((((e.measure_list #>
('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double
precision < '1'::double precision) THEN '5 Critical'::text ELSE '6
Unk'::text END, NULL::text, NULL::numeric, NULL::numeric, '6 Unk'::text

Buffers: shared hit=365255

-> Seq Scan on geo_ants.file_hist fh (cost=0.00..443.28 rows=311
width=8) (actual time=0.698..1.434 rows=315 loops=1)

Output: fh.file_id, fh.file_name, fh.rtu, fh.port, fh.act_code,
fh.file_size, fh.file_tms, fh.loaded_tms, fh.update_tms, fh.status,
fh.data_min_tms, fh.data_max_tms, fh.enh_tms, fh.file_type,
fh.partial_output_flag, fh.record_count, fh.status_description,
fh.act_lenght, fh.act_id, fh.file_act_done, fh.enh_start_tms, fh.agn_code,
fh.agn_group_id, fh.ts_sched_id, fh.ts_sched_ver, fh.enh_attempt,
fh.act_done_list, fh.data_max_proc_tms, fh.data_max_loaded_tms,
fh.error_count, fh.dbg_mode

Filter: ((fh.data_min_tms <= '2022-04-25 00:00:00'::timestamp
without time zone) AND (fh.data_max_tms >= '2022-02-28 00:00:00'::timestamp
without time zone) AND (fh.agn_group_id = 21))

Rows Removed by Filter: 3358

Buffers: shared hit=379

-> Append (cost=0.43..4609.77 rows=57257 width=1552) (actual
time=0.012..9.971 rows=1319 loops=315)

Buffers: shared hit=106416

-> Index Scan using geo_measr_sample_2022_02_act_id_tms_idx on
geo_ants.geo_measr_sample_2022_02 e_1 (cost=0.43..14.42 rows=166
width=1362) (actual time=0.003..0.003 rows=0 loops=315)

Output: e_1.tms, e_1.measure_list, e_1.act_id

Index Cond: ((e_1.act_id = fh.act_id) AND (e_1.tms >=
'2022-02-28 00:00:00'::timestamp without time zone) AND (e_1.tms <=
'2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_1.measure_list #>> '{act_edit,s}'::text[]) <>
'excld'::text) OR ((e_1.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Buffers: shared hit=946

-> Index Scan using geo_measr_sample_2022_03_act_id_tms_idx on
geo_ants.geo_measr_sample_2022_03 e_2 (cost=0.56..2333.98 rows=30845
width=1552) (actual time=0.006..7.586 rows=1061 loops=315)

Output: e_2.tms, e_2.measure_list, e_2.act_id

Index Cond: ((e_2.act_id = fh.act_id) AND (e_2.tms >=
'2022-02-28 00:00:00'::timestamp without time zone) AND (e_2.tms <=
'2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_2.measure_list #>> '{act_edit,s}'::text[]) <>
'excld'::text) OR ((e_2.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Rows Removed by Filter: 3

Buffers: shared hit=75873

-> Index Scan using geo_measr_sample_2022_04_act_id_tms_idx on
geo_ants.geo_measr_sample_2022_04 e_3 (cost=0.43..1975.08 rows=26246
width=1557) (actual time=0.005..2.232 rows=258 loops=315)

Output: e_3.tms, e_3.measure_list, e_3.act_id

Index Cond: ((e_3.act_id = fh.act_id) AND (e_3.tms >=
'2022-02-28 00:00:00'::timestamp without time zone) AND (e_3.tms <=
'2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_3.measure_list #>> '{act_edit,s}'::text[]) <>
'excld'::text) OR ((e_3.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Buffers: shared hit=29597

Query Identifier: -6803725219970975357

Planning:

Buffers: shared hit=933

Planning Time: 2.057 ms

Execution Time: 33677.292 ms

*RETURNS* *text*

*LANGUAGE* *sql*

*IMMUTABLE*

*AS* *$function$*

----------------------------------------------------------------------------------------------------------------------

-- Author: Federico Travaglini

-- Date: 2020

-- Description:

-- Change Hist: please mark changes in code as "yyyy-mm-dd, Author, change
request id in Merant, brief description"

----------------------------------------------------------------------------------------------------------------------

-- 20220426 non so perchè ma in questa versione non è efifciente

*select*

*case*

--WHEN v_measure_value= 'NaN' THEN '6 Unk'::text non
scommentare o le performance per qualche motivo iragionevole degradano di
molto.

*when* thr_value_1 = thr_value_4 *then* -- colorazione
disabilitata, ad esempio per lat, long...

'6 none'::*text*

*when* thr_value_1 > thr_value_4 *then* -- valori critical >
clear

-- SIAMO NEL CASO: valori critical > clear ( thr_5 clear
thr_4 warning thr_3 minor thr_2 major thr_1 critical)

*CASE*

*WHEN* v_measure_value >= thr_value_1 *THEN* '5
Critical'::*text* --critical

*WHEN* v_measure_value < thr_value_1 *AND*
v_measure_value >= thr_value_2 *THEN* '4 Major'::*text* --major

*WHEN* v_measure_value < thr_value_2 *AND*
v_measure_value >= thr_value_3 *THEN* '3 Minor'::*text* --minor

*WHEN* v_measure_value < thr_value_3 *AND*
v_measure_value >= thr_value_4 *THEN* '2 Warning'::*text* --warning

*WHEN* v_measure_value < thr_value_4 *THEN* '1 Clear'::
*text* --clear

*ELSE* '6 Unk'::*text* -- null values

*end*

*else*

-- SIAMO NEL CASO: valori critical < clear (critical thr_1
maj thr_2 minor thr_3 war thr_4 clear thr_5)

*CASE*

*WHEN* v_measure_value < thr_value_1 *THEN* '5 Critical'
::*text* --critical

*WHEN* v_measure_value >= thr_value_1 *AND*
v_measure_value < thr_value_2 *THEN* '4 Major'::*text* --major

*WHEN* v_measure_value >= thr_value_2 *AND*
v_measure_value < thr_value_3 *THEN* '3 Minor'::*text* --minor

*WHEN* v_measure_value >= thr_value_3 *AND*
v_measure_value < thr_value_4 *THEN* '2 Warning'::*text* --warning

*WHEN* v_measure_value >= thr_value_4 *THEN* '1 Clear'::
*text* --clear

*ELSE* '6 Unk'::*text* -- null values

*end*

*end*::*text*

*$function$*

;

By the way, if I call the overall function where it is this code fragment,
I get much better performance (22 sec in place of 41) re-writing function
case without nesting sub-cases, unfortunately I’m not so cleaver to get the
query plan for a query executed inside a function

*RETURNS* *text*

*LANGUAGE* *sql*

*IMMUTABLE*

*AS* *$function$*

----------------------------------------------------------------------------------------------------------------------

-- Author: Federico Travaglini

-- Date: 2020

-- Description:

-- Change Hist: please mark changes in code as "yyyy-mm-dd, Author, change
request id in Merant, brief description"

----------------------------------------------------------------------------------------------------------------------

*select*

*case*

--WHEN v_measure_value= 'NaN' THEN '6 Unk'::text this must be
commented, it is not a problem because the semantic does not change (same
case of the ELSE), but I don’t understand why it changes performance.

*when* thr_value_1 = thr_value_4 *then* '6 Unk'::*text* --
colorazione disabilitata, ad esempio per lat, long...

-- SIAMO NEL CASO: valori critical > clear ( thr_5 clear thr_4
warning thr_3 minor thr_2 major thr_1 critical)

*WHEN* thr_value_1 > thr_value_4 *and* v_measure_value <
thr_value_4 *THEN* '1 Clear'::*text* --clear

*WHEN* thr_value_1 > thr_value_4 *and* v_measure_value <
thr_value_3 *THEN* '2 Warning'::*text* --warning

*WHEN* thr_value_1 > thr_value_4 *and* v_measure_value <
thr_value_2 *THEN* '3 Minor'::*text* --minor

*WHEN* thr_value_1 > thr_value_4 *and* v_measure_value <
thr_value_1 *THEN* '4 Major'::*text* --major

*WHEN* thr_value_1 > thr_value_4 *and* v_measure_value >=
thr_value_1 *THEN* '5 Critical'::*text* --major

-- SIAMO NEL CASO: valori critical < clear (critical thr_1 maj
thr_2 minor thr_3 war thr_4 clear thr_5)

*WHEN* thr_value_1 < thr_value_4 *and* v_measure_value >=
thr_value_4 *THEN* '1 Clear'::*text* --clear

*WHEN* thr_value_1 < thr_value_4 *and* v_measure_value >=
thr_value_3 *THEN* '2 Warning'::*text* --warning

*WHEN* thr_value_1 < thr_value_4 *and* v_measure_value >=
thr_value_2 *THEN* '3 Minor'::*text* --minor

*WHEN* thr_value_1 < thr_value_4 *and* v_measure_value >=
thr_value_1 *THEN* '4 Major'::*text* --major

*WHEN* thr_value_1 < thr_value_4 *and* v_measure_value <
thr_value_1 *THEN* '5 Critical'::*text* --critical

*ELSE* '6 Unk'::*text* -- null values

*end*::*text*

*$function$*

;

*Da:* Merlin Moncure <mmoncure@gmail.com>
*Inviato:* lunedì 25 aprile 2022 21:24
*A:* Federico Travaglini <federico.travaglini@aubay.it>
*Cc:* pgsql-bugs <pgsql-bugs@lists.postgresql.org>
*Oggetto:* Re: 14.1 immutable function, bad performance if check number =
'NaN'

On Mon, Apr 25, 2022 at 11:58 AM Federico Travaglini <
federico.travaglini@aubay.it> wrote:

Good evening, and thanks to your excellent Postgres.

This funcion in used as a column in a select on about 400k records

If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds!
Such behaviour looks not to be reasonable.

merlin

Merlin Moncure

mmoncure@gmail.com

about 4 years ago

In reply to: Federico Travaglini (#5)

Re: 14.1 immutable function, bad performance if check number = 'NaN'

On Tue, Apr 26, 2022 at 2:45 AM Federico Travaglini
<federico.travaglini@collaboration.aubay.it> wrote:

Good morning, thank you very much for the time you spent for my question.

Buffers: shared hit=365255

-> Seq Scan on geo_ants.file_hist fh (cost=0.00..443.28 rows=311 width=8) (actual time=0.698..1.434 rows=315 loops=1)

Output: fh.file_id, fh.file_name, fh.rtu, fh.port, fh.act_code, fh.file_size, fh.file_tms, fh.loaded_tms, fh.update_tms, fh.status, fh.data_min_tms, fh.data_max_tms, fh.enh_tms, fh.file_type, fh.partial_output_flag, fh.record_count, fh.status_description, fh.act_lenght, fh.act_id, fh.file_act_done, fh.enh_start_tms, fh.agn_code, fh.agn_group_id, fh.ts_sched_id, fh.ts_sched_ver, fh.enh_attempt, fh.act_done_list, fh.data_max_proc_tms, fh.data_max_loaded_tms, fh.error_count, fh.dbg_mode

Filter: ((fh.data_min_tms <= '2022-04-25 00:00:00'::timestamp without time zone) AND (fh.data_max_tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (fh.agn_group_id = 21))

Rows Removed by Filter: 3358

Buffers: shared hit=379

-> Append (cost=0.43..4609.77 rows=57257 width=1552) (actual time=0.012..9.971 rows=1319 loops=315)

Buffers: shared hit=106416

-> Index Scan using geo_measr_sample_2022_02_act_id_tms_idx on geo_ants.geo_measr_sample_2022_02 e_1 (cost=0.43..14.42 rows=166 width=1362) (actual time=0.003..0.003 rows=0 loops=315)

Output: e_1.tms, e_1.measure_list, e_1.act_id

Index Cond: ((e_1.act_id = fh.act_id) AND (e_1.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_1.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_1.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_1.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Buffers: shared hit=946

-> Index Scan using geo_measr_sample_2022_03_act_id_tms_idx on geo_ants.geo_measr_sample_2022_03 e_2 (cost=0.56..2333.98 rows=30845 width=1552) (actual time=0.006..7.586 rows=1061 loops=315)

Output: e_2.tms, e_2.measure_list, e_2.act_id

Index Cond: ((e_2.act_id = fh.act_id) AND (e_2.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_2.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_2.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_2.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Rows Removed by Filter: 3

Buffers: shared hit=75873

-> Index Scan using geo_measr_sample_2022_04_act_id_tms_idx on geo_ants.geo_measr_sample_2022_04 e_3 (cost=0.43..1975.08 rows=26246 width=1557) (actual time=0.005..2.232 rows=258 loops=315)

Output: e_3.tms, e_3.measure_list, e_3.act_id

Index Cond: ((e_3.act_id = fh.act_id) AND (e_3.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_3.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

Filter: (((e_3.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_3.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

Buffers: shared hit=29597

Query Identifier: -6803725219970975357

Planning:

Buffers: shared hit=933

Planning Time: 2.057 ms

Execution Time: 33677.292 ms

can you paste query plan for 'fast' case, thank you

merlin

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Federico Travaglini (#5)

Re: R: 14.1 immutable function, bad performance if check number = 'NaN'

Federico Travaglini <federico.travaglini@collaboration.aubay.it> writes:

Here it is what I tested. I’s a code fragment from a bigger procedure. The
strings in green are passed as parameters, as well as the thresholds
1,2,3,4,5. To test just this fragment of code I replaced them with fixed
values

Is that different from what you do normally?

In this example, the function clearly is getting inlined, which means that
the parameter values are potentially evaluated multiple times:

antsgeo_get_severity_thr((e.measure_list #> ('{' ||
'cluster_comuni_italiani' || ',o}')::*text*[])::*numeric*, 1, 2, 3, 4, 5)
*AS* severity_1,

expands to

CASE WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '4'::double precision) THEN '1 Clear'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '3'::double precision) THEN '2 Warning'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '2'::double precision) THEN '3 Minor'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision >= '1'::double precision) THEN '4 Major'::text WHEN
((((e.measure_list #>
('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double
precision < '1'::double precision) THEN '5 Critical'::text ELSE '6
Unk'::text END,

That seems pretty inefficient, becase #> isn't the fastest thing
in the world. Maybe the speed differential you're seeing is just
from adding one more evaluation of the #> for the NaN test.

So my advice is to fix things so that #> isn't evaluated multiple
times. There are ways to prevent the inlining from happening but
they're all underdocumented hacks. A more reliable fix would be to
convert the function to plpgsql language.

regards, tom lane

14.1 immutable function, bad performance if check number = 'NaN'

Attachments: