Planning time in explain/explain analyze
Hi,
A user asked in -performance[1] if there is a way to measure the
planning time. Other than measuring it in the client I do not think
there is so I quickly hacked a patched which adds "Planning time" to the
outputs of EXPLAIN and EXPLAIN ANALYZE.
Is this something useful? I think it is, since plan time can become an
issue for complex queries.
A couple of questions about the path: Should the planning time be added
to both EXPLAIN and EXPLAIN ANALYZE? And is the output obvious enough?
The patch does not include any changes to documentation or tests. I will
fix that if people think this patch is useful.
$ EXPLAIN SELECT * FROM pg_stat_activity;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Nested Loop (cost=1.15..2.69 rows=1 width=337)
-> Hash Join (cost=1.02..2.41 rows=1 width=273)
Hash Cond: (s.usesysid = u.oid)
-> Function Scan on pg_stat_get_activity s (cost=0.00..1.00
rows=100 width=209)
-> Hash (cost=1.01..1.01 rows=1 width=68)
-> Seq Scan on pg_authid u (cost=0.00..1.01 rows=1
width=68)
-> Index Scan using pg_database_oid_index on pg_database d
(cost=0.13..0.27 rows=1 width=68)
Index Cond: (oid = s.datid)
Planning time: 0.258 ms
(9 rows)
$ EXPLAIN ANALYZE SELECT * FROM pg_stat_activity;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.15..2.69 rows=1 width=337) (actual
time=0.094..0.096 rows=1 loops=1)
-> Hash Join (cost=1.02..2.41 rows=1 width=273) (actual
time=0.078..0.079 rows=1 loops=1)
Hash Cond: (s.usesysid = u.oid)
-> Function Scan on pg_stat_get_activity s (cost=0.00..1.00
rows=100 width=209) (actual time=0.053..0.053 rows=1 loops=1)
-> Hash (cost=1.01..1.01 rows=1 width=68) (actual
time=0.014..0.014 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on pg_authid u (cost=0.00..1.01 rows=1
width=68) (actual time=0.007..0.009 rows=1 loops=1)
-> Index Scan using pg_database_oid_index on pg_database d
(cost=0.13..0.27 rows=1 width=68) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (oid = s.datid)
Planning time: 0.264 ms
Total runtime: 0.158 ms
(11 rows)
Links
1.
/messages/by-id/CACfv+pKNEmbQyJPCqRGsVmc_HvRGAi3d_Ge893N8qbX+ymhodA@mail.gmail.com
--
Andreas Karlsson
Attachments:
explainplantime-v1.difftext/x-patch; name=explainplantime-v1.diffDownload+31-16
Andreas Karlsson <andreas@proxel.se> writes:
The patch does not include any changes to documentation or tests. I will
fix that if people think this patch is useful.
I take it you've not tried the regression tests with this.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23-12-2013 22:12, Andreas Karlsson wrote:
A user asked in -performance[1] if there is a way to measure the
planning time. Other than measuring it in the client I do not think
there is so I quickly hacked a patched which adds "Planning time" to the
outputs of EXPLAIN and EXPLAIN ANALYZE.Is this something useful? I think it is, since plan time can become an
issue for complex queries.
Yes, but a couple of years ago, this same feature was proposed as a
module [1]/messages/by-id/BANLkTi=sxKTCTcv9hsACu38eaj=fW_4OMA@mail.gmail.com[2]https://github.com/umitanuki/planinstr. I'm not sure if it is worth including as an additional
module or not. Opinions?
[1]: /messages/by-id/BANLkTi=sxKTCTcv9hsACu38eaj=fW_4OMA@mail.gmail.com
/messages/by-id/BANLkTi=sxKTCTcv9hsACu38eaj=fW_4OMA@mail.gmail.com
[2]: https://github.com/umitanuki/planinstr
--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/24/2013 03:33 AM, Tom Lane wrote:
Andreas Karlsson <andreas@proxel.se> writes:
The patch does not include any changes to documentation or tests. I will
fix that if people think this patch is useful.I take it you've not tried the regression tests with this.
Yeah, forgot to mention that we need some way to disable it in the
tests. Either by not having it included in EXPLAIN or by adding an
option to turn it off. Any suggestion on which would be preferable?
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 23, 2013 at 9:54 PM, Andreas Karlsson <andreas@proxel.se> wrote:
On 12/24/2013 03:33 AM, Tom Lane wrote:
Andreas Karlsson <andreas@proxel.se> writes:
The patch does not include any changes to documentation or tests. I will
fix that if people think this patch is useful.I take it you've not tried the regression tests with this.
Yeah, forgot to mention that we need some way to disable it in the tests.
Either by not having it included in EXPLAIN or by adding an option to turn
it off. Any suggestion on which would be preferable?
I would be tempted to display it only if (COSTS OFF) is not given. As
far as I can tell, the major use case for (COSTS OFF) is when you want
the output to be stable so you can include it in a regression test.
Technically speaking, planning time is not a cost, but I'm not sure I
could live with myself if we forced everyone to write (COSTS OFF,
PLANNING_TIME OFF). And I don't think much of the idea of only
including planning time when ANALYZE is used, because you don't have
to want to run the query to want to know how long it took to plan.
Also, +1 for this general concept. Great idea.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Dec 23, 2013 at 9:54 PM, Andreas Karlsson <andreas@proxel.se> wrote:
Yeah, forgot to mention that we need some way to disable it in the tests.
Either by not having it included in EXPLAIN or by adding an option to turn
it off. Any suggestion on which would be preferable?
I would be tempted to display it only if (COSTS OFF) is not given.
Yeah. The alternatives seem to be:
1. Change a lot of regression tests. This would be a serious PITA,
not so much for the one-time cost as for every time we needed to
back-patch a regression test that includes explain output. -1.
2. Don't display the planning time by default, necessitating a new
PLANNING_TIME ON option. This has backwards compatibility to
recommend it, but not much else.
3. Let COSTS OFF suppress it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 23, 2013 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Dec 23, 2013 at 9:54 PM, Andreas Karlsson <andreas@proxel.se> wrote:
Yeah, forgot to mention that we need some way to disable it in the tests.
Either by not having it included in EXPLAIN or by adding an option to turn
it off. Any suggestion on which would be preferable?I would be tempted to display it only if (COSTS OFF) is not given.
Yeah. The alternatives seem to be:
1. Change a lot of regression tests. This would be a serious PITA,
not so much for the one-time cost as for every time we needed to
back-patch a regression test that includes explain output. -1.2. Don't display the planning time by default, necessitating a new
PLANNING_TIME ON option. This has backwards compatibility to
recommend it, but not much else.3. Let COSTS OFF suppress it.
Another option would be to not display it by default, but make VERBOSE
show it. However, I think making it controlled by COSTS is a better
fit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/23/13, 8:12 PM, Andreas Karlsson wrote:
A user asked in -performance[1] if there is a way to measure the
planning time.
log_planner_stats
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/24/2013 02:31 PM, Peter Eisentraut wrote:
On 12/23/13, 8:12 PM, Andreas Karlsson wrote:
A user asked in -performance[1] if there is a way to measure the
planning time.log_planner_stats
Thanks for the suggestion, I will use that if I need to know the plan
times. But I do not think this is as useful as having it in the EXPLAIN
output.
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/24/2013 05:09 AM, Robert Haas wrote:
Yeah. The alternatives seem to be:
1. Change a lot of regression tests. This would be a serious PITA,
not so much for the one-time cost as for every time we needed to
back-patch a regression test that includes explain output. -1.2. Don't display the planning time by default, necessitating a new
PLANNING_TIME ON option. This has backwards compatibility to
recommend it, but not much else.3. Let COSTS OFF suppress it.
Another option would be to not display it by default, but make VERBOSE
show it. However, I think making it controlled by COSTS is a better
fit.
After some thinking COSTS OFF really seems to be the best option. I have
changed this in the patch and will submit a patch with updated
documentation as soon as I get the time. Adding "Planning time" to all
explain outputs (only 37 of them) in the documentation should not
clutter anything too much.
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
New version of the patch with updated documentation and which does not
display the planning time when the COSTS are off.
I will add it to the next commitfest.
--
Andreas Karlsson
Attachments:
explainplantime-v2.difftext/x-patch; name=explainplantime-v2.diffDownload+69-31
On Sat, Dec 28, 2013 at 11:32 PM, Andreas Karlsson <andreas@proxel.se> wrote:
New version of the patch with updated documentation and which does not
display the planning time when the COSTS are off.I will add it to the next commitfest.
I'm wondering whether the time should be stored inside the PlannedStmt
node instead of passing it around separately. One possible problem
with the way you've done things here is that, in the case of a
prepared statement, EXPLAIN ANALYZE will emit the time needed to call
GetCachedPlan(), even if that function didn't do any replanning. Now
you could argue that this is the correct behavior, but I think there's
a decent argument that what we ought to show there is the amount of
time that was required to create the plan that we're displaying at the
time it was created, rather than the amount of time that was required
to figure out that we didn't need to replan.
A minor side benefit of this approach is that you wouldn't need to
change the signature for ExplainOnePlan(), which would avoid breaking
extensions that may call it.
Also, I am inclined to think we ought to update the examples, rather
than say this:
+ rewriting and parsing. We will not include this line in later examples in
+ this section.
+ </para>
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/02/2014 04:08 AM, Robert Haas wrote:
I'm wondering whether the time should be stored inside the PlannedStmt
node instead of passing it around separately. One possible problem
with the way you've done things here is that, in the case of a
prepared statement, EXPLAIN ANALYZE will emit the time needed to call
GetCachedPlan(), even if that function didn't do any replanning. Now
you could argue that this is the correct behavior, but I think there's
a decent argument that what we ought to show there is the amount of
time that was required to create the plan that we're displaying at the
time it was created, rather than the amount of time that was required
to figure out that we didn't need to replan.
Since we support re-planning of prepared statements I would argue the
most useful thing is to print the time it took to fetch the old plan or
the create a new one as the planning time, but I am not a heavy user of
prepared statements.
Also, I am inclined to think we ought to update the examples, rather
than say this:+ rewriting and parsing. We will not include this line in later examples in + this section. + </para>
Ok, I will fix this in the next version of the patch.
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
A patch with updated documentation is attached.
On 01/02/2014 04:08 AM, Robert Haas wrote:
I'm wondering whether the time should be stored inside the PlannedStmt
node instead of passing it around separately. One possible problem
with the way you've done things here is that, in the case of a
prepared statement, EXPLAIN ANALYZE will emit the time needed to call
GetCachedPlan(), even if that function didn't do any replanning. Now
you could argue that this is the correct behavior, but I think there's
a decent argument that what we ought to show there is the amount of
time that was required to create the plan that we're displaying at the
time it was created, rather than the amount of time that was required
to figure out that we didn't need to replan.A minor side benefit of this approach is that you wouldn't need to
change the signature for ExplainOnePlan(), which would avoid breaking
extensions that may call it.
A possible argument against printing the time to create the plan is that
unless it was created when running EXPLAIN we will not know it. I do not
think we want to always measure the time it took to generate a plan due
to slow clocks on some architectures. Also I feel that such a patch
would be more invasive.
Just my reasoning for the current solution. I welcome any opinions about
how to print planning time for prepared statements since I am not a
heavy user of them.
--
Andreas Karlsson
Attachments:
explainplantime-v3.difftext/x-patch; name=explainplantime-v3.diffDownload+80-31
Andreas, Robert,
* Andreas Karlsson (andreas@proxel.se) wrote:
A patch with updated documentation is attached.
Thanks for working on this!
On 01/02/2014 04:08 AM, Robert Haas wrote:
I'm wondering whether the time should be stored inside the PlannedStmt
node instead of passing it around separately. One possible problem
with the way you've done things here is that, in the case of a
prepared statement, EXPLAIN ANALYZE will emit the time needed to call
GetCachedPlan(), even if that function didn't do any replanning. Now
you could argue that this is the correct behavior, but I think there's
a decent argument that what we ought to show there is the amount of
time that was required to create the plan that we're displaying at the
time it was created, rather than the amount of time that was required
to figure out that we didn't need to replan.
Agreed, and really, it'd be nice to know *both*. If we're worried about
the timing cost when going through a no-op GetCachedPlan(), then perhaps
we don't add that, but if we actually *do* re-plan, it'd be handy to
know that and to know the timing it took.
A minor side benefit of this approach is that you wouldn't need to
change the signature for ExplainOnePlan(), which would avoid breaking
extensions that may call it.
Agreed.
A possible argument against printing the time to create the plan is
that unless it was created when running EXPLAIN we will not know it.
This is perhaps the biggest point against- if we keep it this way...
I do not think we want to always measure the time it took to
generate a plan due to slow clocks on some architectures. Also I
feel that such a patch would be more invasive.
The slow-clock argument is really quite frustrating for those of us who
are looking to add more and more metrics to PG. We're nowhere near the
level that we need to be and it shows (particularly for users coming
from $OTHER-RDBMS). Perhaps we should try and come up with a solution
to address those cases (turn off metrics ala turning off stats?) while
not preventing us from gathering metrics on more reasonable systems.
Thanks,
Stephen
On Thu, Jan 9, 2014 at 1:50 PM, Stephen Frost <sfrost@snowman.net> wrote:
I do not think we want to always measure the time it took to
generate a plan due to slow clocks on some architectures. Also I
feel that such a patch would be more invasive.The slow-clock argument is really quite frustrating for those of us who
are looking to add more and more metrics to PG.
Especially since implementing gettimeofday quickly is a solved problem
on most architectures.
However I don't see the issue here. Two gettimeofday calls per query
plan is not really going to hurt even on systems where it's slow. The
problems we run into are explain analyze which runs getimeofday twice
for every node for every tuple processed. For cpu-bound queries that's
can become the dominant cost.
The only way two gettimeofday calls per query plan becomes an issue is
if you're executing non-cached queries repeatedly on data that's
entirely in ram. That means the query processing is entirely cpu-bound
and adding two syscalls could actually be noticeable. The mitigation
strategy would be to prepare and cache the query handle to execute it
again.
We're nowhere near the
level that we need to be and it shows (particularly for users coming
from $OTHER-RDBMS). Perhaps we should try and come up with a solution
to address those cases (turn off metrics ala turning off stats?) while
not preventing us from gathering metrics on more reasonable systems.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greg Stark <stark@mit.edu> writes:
However I don't see the issue here. Two gettimeofday calls per query
plan is not really going to hurt even on systems where it's slow.
I tend to agree with this, especially if the calls only occur when the
user asks for the information (ie, does an EXPLAIN rather than just
executing the query).
The only way two gettimeofday calls per query plan becomes an issue is
if you're executing non-cached queries repeatedly on data that's
entirely in ram. That means the query processing is entirely cpu-bound
and adding two syscalls could actually be noticeable. The mitigation
strategy would be to prepare and cache the query handle to execute it
again.
This point weighs against the proposal that we time the work to fetch
a previously-prepared query plan; if we do that then the "mitigation
strategy" doesn't mitigate anything.
In short then, I think we should just add this to EXPLAIN and be done.
-1 for sticking the info into PlannedStmt or anything like that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 9, 2014 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short then, I think we should just add this to EXPLAIN and be done.
-1 for sticking the info into PlannedStmt or anything like that.
I'm confused. I thought I was arguing to support your suggestion that
the initial planning store the time in the cached plan and explain
should output the time the original planning took.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greg Stark <stark@mit.edu> writes:
On Thu, Jan 9, 2014 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short then, I think we should just add this to EXPLAIN and be done.
-1 for sticking the info into PlannedStmt or anything like that.
I'm confused. I thought I was arguing to support your suggestion that
the initial planning store the time in the cached plan and explain
should output the time the original planning took.
Uh, no, wasn't my suggestion. Doesn't that design imply measuring *every*
planning cycle, explain or no? I was thinking more of just putting the
timing calls into explain.c.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 9, 2014 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <stark@mit.edu> writes:
On Thu, Jan 9, 2014 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short then, I think we should just add this to EXPLAIN and be done.
-1 for sticking the info into PlannedStmt or anything like that.I'm confused. I thought I was arguing to support your suggestion that
the initial planning store the time in the cached plan and explain
should output the time the original planning took.Uh, no, wasn't my suggestion. Doesn't that design imply measuring *every*
planning cycle, explain or no? I was thinking more of just putting the
timing calls into explain.c.
Currently the patch includes changes to prepare.c which is what seems
odd to me. I think it'd be fine to say, hey, I can't give you the
planning time in this EXPLAIN ANALYZE because I just used a cached
plan and did not re-plan. But saying, hey, the planning time is
$TINYVALUE, when what we really mean is that looking up the
previously-cached plan took only that long, seems actively misleading
to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers