Regarding performance regression on specific query
Hello,
I am Jinho Jung, PhD student at Georgia Tech and I am conducting DB performance research. I am sending our evaluation result to get the feedback regarding our findings.
We found several performance regression queries. Attached files are three of them that we confirmed performance regression (in terms of query execution time) between v10.6 and v9.4.20. Hope you can test these queries and give us feedback. For your information, we are also sending the execution time measured on our machine.
Execution time
=============
1.sql
10.6 : 469 ms
9.4.20: 10 ms
4.sql
10.6 : 34019 ms
9.4.20: 0.4 ms
20.sql
10.6 : 2791 ms
9.4.20: 61 ms
Evaluation setup
==============
1) compile 9.4.20 and 10.6 from released source code (https://www.postgresql.org/ftp/source/)
2) without importing additional data, just run the attached queries
We appreciate you taking time for test!
Best regards,
Jinho Jung
Attachments:
1.sqlapplication/sql; name=1.sqlDownload
EXPLAIN ANALYZE
WITH
jennifer_0 AS (select
subq_0.c3 as c0,
subq_0.c6 as c1,
subq_0.c0 as c2,
subq_0.c3 as c3,
(select pg_catalog.max(idx_scan) from pg_catalog.pg_stat_user_indexes)
as c4,
case when subq_0.c6 is NULL then pg_catalog.timeofday() else pg_catalog.timeofday() end
as c5
from
(select
ref_0.typanalyze as c0,
ref_0.typispreferred as c1,
ref_0.typtypmod as c2,
ref_0.typrelid as c3,
ref_0.typcollation as c4,
ref_0.typdelim as c5,
ref_0.typnotnull as c6
from
pg_catalog.pg_type as ref_0
where case when ((cast(null as anyrange) <> cast(null as anyrange))
and (((cast(null as path) @> cast(null as point))
or ((ref_0.typtypmod <= ref_0.typndims)
and ((ref_0.typtype is not NULL)
and (((false)
and (cast(null as tsvector) @@@ cast(null as tsquery)))
or (cast(null as record) *<> cast(null as record))))))
and (cast(null as box) ?# cast(null as box))))
and (ref_0.typname >= ref_0.typname) then cast(null as polygon) else cast(null as polygon) end
<@ pg_catalog.polygon(
cast(ref_0.typndims as int4),
cast(cast(null as circle) as circle))
limit 45) as subq_0
where (pg_catalog.tintervalin(
cast(case when (true)
and ((cast(null as circle) |>> cast(null as circle))
and (true)) then cast(null as cstring) else cast(null as cstring) end
as cstring)) #> case when subq_0.c2 is not NULL then case when true then cast(null as reltime) else cast(null as reltime) end
else case when true then cast(null as reltime) else cast(null as reltime) end
end
)
or ((select agginitval from pg_catalog.pg_aggregate limit 1 offset 5)
>= (select provider from pg_catalog.pg_shseclabel limit 1 offset 6)
))
select
subq_1.c0 as c0,
subq_1.c0 as c1,
subq_1.c0 as c2,
subq_1.c1 as c3,
subq_1.c1 as c4,
subq_1.c0 as c5
from
(select
ref_1.relname as c0,
ref_1.relname as c1
from
pg_catalog.pg_statio_sys_sequences as ref_1
inner join information_schema.role_routine_grants as ref_2
on (ref_1.blks_read >= ref_1.blks_hit)
inner join pg_catalog.pg_type as ref_3
right join information_schema.user_mapping_options as ref_4
on ((select seq_scan from pg_catalog.pg_stat_xact_sys_tables limit 1 offset 2)
<= ref_3.typlen)
on (ref_2.specific_name = ref_4.authorization_identifier )
where ref_1.blks_hit is not NULL) as subq_1
where true
limit 107
;;Hi,
On 2018/11/20 2:49, Jung, Jinho wrote:
Execution time
=============
1.sql
10.6 : 469 ms
9.4.20: 10 ms4.sql
10.6 : 34019 ms
9.4.20: 0.4 ms
I noticed that these two are fixed by running ANALYZE in the database in
which these queries are run.
20.sql
10.6 : 2791 ms
9.4.20: 61 ms
This one may be suffering from a more serious planning issue, as doing
ANALYZE didn't help for this one. Will have to look closely at how the
plan is changing for worse.
Thanks,
Amit
Thanks for the test.
We are wondering how ANALYZE mitigated regression from query "1.sql" and "4.sql".
We followed this procedure but still observe performance regression:
1) run ANALYZE on used table_name
analyze pg_catalog.pg_ts_parser;
analyze information_schema.column_options;
analyze pg_catalog.pg_aggregate;
analyze pg_catalog.pg_inherits;
analyze pg_catalog.pg_aggregate;
analyze pg_catalog.pg_rewrite;
analyze pg_catalog.pg_stat_user_indexes;
analyze pg_catalog.pg_stat_user_tables;
analyze pg_catalog.pg_attribute;
analyze information_schema.column_privileges;
analyze pg_catalog.pg_user_mapping;
analyze pg_catalog.pg_type;
analyze pg_catalog.pg_shseclabel;
analyze pg_catalog.pg_statio_sys_sequences;
analyze information_schema.role_routine_grants;
analyze pg_catalog.pg_type;
analyze information_schema.user_mapping_options;
analyze pg_catalog.pg_stat_xact_sys_tables;
2) execute the same query
We have more cases. Do you think we should report them through the bug report website? (https://www.postgresql.org/account/login/?next=/account/submitbug/)
Jinho Jung
________________________________
From: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Sent: Tuesday, November 20, 2018 2:47:54 AM
To: Jung, Jinho; pgsql-hackers@postgresql.org
Subject: Re: Regarding performance regression on specific query
Hi,
On 2018/11/20 2:49, Jung, Jinho wrote:
Execution time
=============
1.sql
10.6 : 469 ms
9.4.20: 10 ms4.sql
10.6 : 34019 ms
9.4.20: 0.4 ms
I noticed that these two are fixed by running ANALYZE in the database in
which these queries are run.
20.sql
10.6 : 2791 ms
9.4.20: 61 ms
This one may be suffering from a more serious planning issue, as doing
ANALYZE didn't help for this one. Will have to look closely at how the
plan is changing for worse.
Thanks,
Amit
On Tue, Nov 20, 2018 at 10:08 PM Jung, Jinho <jinho.jung@gatech.edu> wrote:
We are wondering how ANALYZE mitigated regression from query "1.sql" and "4.sql".
We followed this procedure but still observe performance regression:
1) run ANALYZE on used table_name
analyze pg_catalog.pg_ts_parser;
analyze information_schema.column_options;
analyze pg_catalog.pg_aggregate;
analyze pg_catalog.pg_inherits;
analyze pg_catalog.pg_aggregate;
analyze pg_catalog.pg_rewrite;
analyze pg_catalog.pg_stat_user_indexes;
analyze pg_catalog.pg_stat_user_tables;
analyze pg_catalog.pg_attribute;
analyze information_schema.column_privileges;
analyze pg_catalog.pg_user_mapping;
analyze pg_catalog.pg_type;
analyze pg_catalog.pg_shseclabel;
analyze pg_catalog.pg_statio_sys_sequences;
analyze information_schema.role_routine_grants;
analyze pg_catalog.pg_type;
analyze information_schema.user_mapping_options;
analyze pg_catalog.pg_stat_xact_sys_tables;
You can run ANALYZE without explicitly specifying any table name, so
that you don't miss any.
We have more cases. Do you think we should report them through the bug report website? (https://www.postgresql.org/account/login/?next=/account/submitbug/)
You can send an email to pgsql-performance mailing list.
https://www.postgresql.org/list/pgsql-performance/
Thanks,
Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2018/11/20 2:49, Jung, Jinho wrote:
[ assorted queries ]
I noticed that these two are fixed by running ANALYZE in the database in
which these queries are run.
That didn't help much for me. What did help was increasing
join_collapse_limit and from_collapse_limit to not limit the
join search space --- on queries with as many input relations
as these, you're really at the mercy of whether the given query
structure represents a good join order if you don't.
In general I can't get even a little bit excited about the quality of the
plans selected for these examples, as they all involve made-up restriction
and join clauses that the planner isn't going to have the slightest clue
about. The observations boil down to "9.4 made one set of arbitrary plan
choices, while v10 made a different set of arbitrary plan choices, and on
these particular examples 9.4 got lucky and 10 didn't".
Possibly also worth noting is that running these in an empty database
is in itself kind of a worst case, because many of the tables are empty
to start with (or the restriction/join clauses pass no rows), and so
the fastest runtime tends to go to plans of the form "nestloop with
empty relation on the outside and all the expensive stuff on the
inside". (Observe all the "(never executed)" notations in the EXPLAIN
output.) This kind of plan wins only when the outer rel is actually
empty, otherwise it can easily lose big, and therefore PG's planner is
intentionally designed to discount the case entirely. We never believe
that a relation is empty, unless we can mathematically prove that, and
our cost estimates are never made with an eye to exploiting such cases.
This contributes a lot to the random-chance nature of which plan is
actually fastest; the planner isn't expecting "(never executed)" to
happen and doesn't prefer plans that will win if it does happen.
regards, tom lane
Thanks for the comment.
We also have several performance regression cases that we found from TPC-C benchmark. Since those queries were not executed on empty relation, they will be more interesting.
We will report to pgsql-performance mailing list next time.
Jinho Jung
________________________________
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, November 24, 2018 3:32:41 PM
To: Amit Langote
Cc: Jung, Jinho; pgsql-hackers@postgresql.org
Subject: Re: Regarding performance regression on specific query
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2018/11/20 2:49, Jung, Jinho wrote:
[ assorted queries ]
I noticed that these two are fixed by running ANALYZE in the database in
which these queries are run.
That didn't help much for me. What did help was increasing
join_collapse_limit and from_collapse_limit to not limit the
join search space --- on queries with as many input relations
as these, you're really at the mercy of whether the given query
structure represents a good join order if you don't.
In general I can't get even a little bit excited about the quality of the
plans selected for these examples, as they all involve made-up restriction
and join clauses that the planner isn't going to have the slightest clue
about. The observations boil down to "9.4 made one set of arbitrary plan
choices, while v10 made a different set of arbitrary plan choices, and on
these particular examples 9.4 got lucky and 10 didn't".
Possibly also worth noting is that running these in an empty database
is in itself kind of a worst case, because many of the tables are empty
to start with (or the restriction/join clauses pass no rows), and so
the fastest runtime tends to go to plans of the form "nestloop with
empty relation on the outside and all the expensive stuff on the
inside". (Observe all the "(never executed)" notations in the EXPLAIN
output.) This kind of plan wins only when the outer rel is actually
empty, otherwise it can easily lose big, and therefore PG's planner is
intentionally designed to discount the case entirely. We never believe
that a relation is empty, unless we can mathematically prove that, and
our cost estimates are never made with an eye to exploiting such cases.
This contributes a lot to the random-chance nature of which plan is
actually fastest; the planner isn't expecting "(never executed)" to
happen and doesn't prefer plans that will win if it does happen.
regards, tom lane