inputs into query planner costing
Hi there,
I'm having an issue with query performance between 2 different pgsql
environments.
Ther first is our current production postgres server with is running 9.3.5
on Centos 5 x64. The second system is Amazon's RDS postgres as a service.
On our local DB server we have a query that executes in a reasonable amount
of time (600 msec). On RDS the query will run for more then 10 minutes on
a similarly CPU specced systems. I've been working through with Amazon
support and I'm looking for more suggestions on where to look (both for me
and to direct Amazon). The RDS system does use a network filesystem while
our production server is a local RAID10 array, I can see that effecting the
actual performance of the query but not the query planner costing (unless
there's an input to query planner costing that I can't find)
The Query plan costs generated by the 2 systems are vastly different, while
the plans themselves are basically identical other then the materialization
that RDS is doing (if I disable the materialization then they are almost
the same other then a seq scan/heap scan on one small <2000 row table).
All the tables in the query have been analyzed on each server without any
impact
Current Production
Explain:
http://explain.depesz.com/s/Tkyc
Explain Analyze
http://explain.depesz.com/s/UnQt
RDS: (with enable_material=off)
http://explain.depesz.com/s/vDiV
(with enable_material=on)
http://explain.depesz.com/s/HUjx
I have validated that all the query planning configuration variables on
this page
http://www.postgresql.org/docs/9.3/static/runtime-config-query.html are the
same between the 2 environments. If I modify the local production system
values for things like random_page_cost and seq_page_cost to absurd values
like 60000 I can get it to generate a similar planner cost. Similarly if I
lower the RDS values to absurdly low values like .00000001 I can get it to
generate a similarly costed plan (while still performing horridly).
I've reached the end of things I can think about (I'm also working on
rewriting the query but it's a generated query out of a infrastructure
component so it's not a simple change).
Just looking for any ideas on additional things to look into.
The query is available here:
https://www.dropbox.com/s/m31ct6k0mod0576/simplifiedquery.sql?dl=0
--
Data's inconvienient when people have opinions.
On 3/31/2015 10:31 AM, Mike Roest wrote:
Hi there,
I'm having an issue with query performance between 2 different pgsql
environments.Ther first is our current production postgres server with is running
9.3.5 on Centos 5 x64. The second system is Amazon's RDS postgres as a
service. On our local DB server we have a query that executes in a
reasonable amount of time (600 msec). On RDS the query will run for
more then 10 minutes on a similarly CPU specced systems. I've been
working through with Amazon support and I'm looking for more suggestions
on where to look (both for me and to direct Amazon). The RDS system
does use a network filesystem while our production server is a local
RAID10 array, I can see that effecting the actual performance of the
query but not the query planner costing (unless there's an input to
query planner costing that I can't find)The Query plan costs generated by the 2 systems are vastly different,
while the plans themselves are basically identical other then the
materialization that RDS is doing (if I disable the materialization then
they are almost the same other then a seq scan/heap scan on one small
<2000 row table). All the tables in the query have been analyzed on
each server without any impactCurrent Production
Explain:
http://explain.depesz.com/s/Tkyc
Explain Analyze
http://explain.depesz.com/s/UnQtRDS: (with enable_material=off)
http://explain.depesz.com/s/vDiV(with enable_material=on)
http://explain.depesz.com/s/HUjxI have validated that all the query planning configuration variables on
this page
http://www.postgresql.org/docs/9.3/static/runtime-config-query.html are
the same between the 2 environments. If I modify the local production
system values for things like random_page_cost and seq_page_cost to
absurd values like 60000 I can get it to generate a similar planner
cost. Similarly if I lower the RDS values to absurdly low values like
.00000001 I can get it to generate a similarly costed plan (while still
performing horridly).I've reached the end of things I can think about (I'm also working on
rewriting the query but it's a generated query out of a infrastructure
component so it's not a simple change).Just looking for any ideas on additional things to look into.
The query is available here:
https://www.dropbox.com/s/m31ct6k0mod0576/simplifiedquery.sql?dl=0--
Data's inconvienient when people have opinions.
Seems like there is no useable index on table timesheet. It always
seems to table scan all 99K rows, several times.
I'll bet on RDS that table scan is super slow.
The RDS plans seem to be just explain? I assume its too slow to run an
explain analyze on? Would be neat to see explain analyze from RDS.
(any way to add a little extra where magic to cut the rows down to a
useable, but still slow, sample?)
On this one:
http://explain.depesz.com/s/UnQt
Line 11 table scans 99K rows, then all those rows are carried up the
chain (lines 10, 9, 8, 7 and 6). Any way to reduce the row count
earlier? Line 5 finally seems to filter out 94K rows.
Would be neat to see if these are buffered reads or are actually hitting
disk too. (something like EXPLAIN (ANALYZE, BUFFERS) select...)
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Mike Roest <mike.roest@replicon.com> writes:
I'm having an issue with query performance between 2 different pgsql
environments.
Ther first is our current production postgres server with is running 9.3.5
on Centos 5 x64. The second system is Amazon's RDS postgres as a service.
On our local DB server we have a query that executes in a reasonable amount
of time (600 msec). On RDS the query will run for more then 10 minutes on
a similarly CPU specced systems. I've been working through with Amazon
support and I'm looking for more suggestions on where to look (both for me
and to direct Amazon). The RDS system does use a network filesystem while
our production server is a local RAID10 array, I can see that effecting the
actual performance of the query but not the query planner costing (unless
there's an input to query planner costing that I can't find)
The Query plan costs generated by the 2 systems are vastly different, while
the plans themselves are basically identical other then the materialization
that RDS is doing (if I disable the materialization then they are almost
the same other then a seq scan/heap scan on one small <2000 row table).
All the tables in the query have been analyzed on each server without any
impact
Last I checked, there was not any magic pixie dust in the planner ;-).
Your results have to be explained by one or more of these things:
1. Not same version of Postgres between the two systems.
2. Not same planner parameter settings.
3. Different physical table sizes.
4. Different ANALYZE statistics.
As for #1, I have no idea whether Amazon RDS runs a purely stock Postgres
release or has some custom modifications of their own, but it'd be worth
asking about that.
As for #2, you say you checked that, but I'm dubious. In particular this
discrepancy:
Index Scan using uix2pdas_userpolicy on policydataaccessscope policydataaccessscope31 (cost=0.28..8.30 rows=1 width=16)
Index Scan using uix2pdas_userpolicy on policydataaccessscope policydataaccessscope31 (cost=0.28..4.30 rows=1 width=16)
is hard to explain unless the second system is using a smaller
random_page_cost than the first. Maybe somebody used ALTER ROLE SET
or ALTER DATABASE SET to adjust parameters in a way that only affects
some roles/databases?
I suspect that the large differences in some of the seqscan costs might be
explainable by #3, ie those tables are bloated with lots of empty space on
one system but not the other. Comparing pg_relation_size() would be the
way to find out.
I mention #4 for completeness. ANALYZE uses random sampling, so it's
expectable that the data distribution stats would be a bit different on
the two systems, but large differences that persist across multiple
ANALYZE attempts are unlikely. (Although ... you do have the
same default_statistics_target on both systems, no? Table-specific
statistics targets could be a gotcha as well.)
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks for the responses
For anyone searching in the future I'll answer Tom's questions and list the
boneheaded fix that it ended up actually being (really painful as I've been
fighting this for a week).
1) According to amazon they run stock postgres as far as the query planner
is concerned.
2) Yes sorry I forgot to note on our prod system the random_page_cost was 2
vs 4 on the RDS system.
3) I had run vacuum on all the tables in the query and the pg_relation_size
on the tables aren't way out of wack
4) Yep both default_statistics_target was the default of 100 on both.
I was concentrating completely on the wrong direction here. What it turned
out to be was the RDS configuration of postgres which we had modified
somewhat I had missed configuring work_mem to something greater then their
default of 1 MB. Once I brought work_mem upto the same value as our
production server low and behold the query runs fast.
Sorry for wasting everyones time. Hopefully this will help someone else
down the line.