RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Hi,
When analyzing the plan of a query I often find myself questioning
whether an additional index may be sensible, or if it is sensible that a
SeqScan is used if an index is available.
The current EXPLAIN ANALYZE only shows the number of tuples matching the
qualifier of an SeqScan Node - for analyzing the above situation it is
at least equally interesting how many tuples were read and discarded.
Therefore I produced a patch which adds a 'discarded=%f' part to the
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment
- the discarded counter is increased in execScan directly instead of a
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for
some there might be a sensible semantic meaning to it...
Good idea - Bad idea?
Greetings,
Andres
...
Attachments:
0001-Feature-discarded-tuple-count-display-in-EXPLAIN-ANA.patchtext/x-diff; name=0001-Feature-discarded-tuple-count-display-in-EXPLAIN-ANA.patchDownload
Andres Freund wrote:
When analyzing the plan of a query I often find myself questioning
whether an additional index may be sensible, or if it is sensible that a
SeqScan is used if an index is available.The current EXPLAIN ANALYZE only shows the number of tuples matching the
qualifier of an SeqScan Node - for analyzing the above situation it is
at least equally interesting how many tuples were read and discarded.Therefore I produced a patch which adds a 'discarded=%f' part to the
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment
- the discarded counter is increased in execScan directly instead of a
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for
some there might be a sensible semantic meaning to it...Good idea - Bad idea?
Isn't the discarded count always equal to (# of rows in table - matched
tuples)? Seems pretty redundant to me.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hi,
On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
Andres Freund wrote:
When analyzing the plan of a query I often find myself questioning
whether an additional index may be sensible, or if it is sensible that
a SeqScan is used if an index is available.The current EXPLAIN ANALYZE only shows the number of tuples matching
the qualifier of an SeqScan Node - for analyzing the above situation
it is at least equally interesting how many tuples were read and
discarded.
Good idea - Bad idea?Isn't the discarded count always equal to (# of rows in table - matched
tuples)? Seems pretty redundant to me.
Not for EXISTS(), LIMIT and similar.
Also when looking at more complex plans its quite a nuisance to go
through all participating tables and do a separate count(*). Especially
its not your plan but some clients plan etc.
Andres
Andres Freund <andres@anarazel.de> writes:
On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
Isn't the discarded count always equal to (# of rows in table - matched
tuples)? Seems pretty redundant to me.
Not for EXISTS(), LIMIT and similar.
It doesn't really seem useful enough to justify breaking client-side
code that looks at EXPLAIN output.
This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format. The barrier for adding additional output fields would
be a lot lower in such a format.
regards, tom lane
On Fri, May 22, 2009 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It doesn't really seem useful enough to justify breaking client-side
code that looks at EXPLAIN output.
Fwiw at least pgadmin I don't think would be confused by this. These
tool authors aren't enamoured of fragile assumptions and the
maintenance headaches they cause either.
This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format. The barrier for adding additional output fields would
be a lot lower in such a format.
This is still pretty much true if only for the sheer unscalability of
the amount of data being presented for users to sift through. I do
want us to add a ton more instrumentation into the explain plan and
this is only one small addition. If we add number of hard and soft
i/os, time spent in user and system space, etc the result would be
pretty unreadable and they're at least as important as things like
this.
--
greg
Hi,
On 05/22/2009 05:54 PM, Tom Lane wrote:
This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format. The barrier for adding additional output fields would
be a lot lower in such a format.
So the best thing would be to work on that front...
Tom (Raney), did you further work on your XML explain patch? Could you
use help?
Greetings,
Andres