generic options for explain

Started by Robert Haasalmost 17 years ago75 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Well, here we are! Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de! I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2]http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com. The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords. Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax. See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

As written, this patch doesn't introduce any actual new functionality,
but I think it's pretty easy to see how we could build on the syntax
to add things like different types of output formats, different types
of instrumentation, etc. A few other random notes:

- This currently lacks documentation. If we have any consensus that
this is a reasonable approach, I'll add some.
- I noticed that we currently accept as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)). But
you can't put parentheses around any other type of statement. Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all. But that's neither here nor there as far as
this patch is concerned, except that it required some minor grammar
hackery and a long comment explaining the hackery.

Thoughts?

...Robert

[1]: http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
[2]: http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com

Attachments:

explain_options.v1.patchtext/x-diff; charset=US-ASCII; name=explain_options.v1.patchDownload+72-5
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#1)
Re: generic options for explain

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

regards
Pavel Stehule

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

Show quoted text

Well, here we are!  Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]!  I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2].  The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords.  Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax.  See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

As written, this patch doesn't introduce any actual new functionality,
but I think it's pretty easy to see how we could build on the syntax
to add things like different types of output formats, different types
of instrumentation, etc.  A few other random notes:

- This currently lacks documentation.  If we have any consensus that
this is a reasonable approach, I'll add some.
- I noticed that we currently accept as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)).  But
you can't put parentheses around any other type of statement.  Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all.  But that's neither here nor there as far as
this patch is concerned, except that it required some minor grammar
hackery and a long comment explaining the hackery.

Thoughts?

...Robert

[1] http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
[2] http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: generic options for explain

Hi Robert,

On 05/24/2009 02:47 AM, Robert Haas wrote:

Well, here we are! Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]! I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2]. The current "EXPLAIN [ANALYZE] [VERBOSE]<query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords. Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax. See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

Beeing the latest cause for the frustration leading to this patch I
obviously would like something like that - and I would gladly implement
some additional stats suggested by others(if implementable in a
reasonable timeframe) if this approach is agreed uppon.

- I noticed that we currently acce pt as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)). But
you can't put parentheses around any other type of statement. Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all.

I would guess that stems from supporting syntax like:
(SELECT 1)
UNION
(SELECT 2)
ORDER BY

and not wanting to introduce a special path for
(SELECT 1)
ORDER BY

For additional stats to be kept another discussion about appropriate,
extensible representation suitable for different output formats probably
would be needed - but thats a discussion for another day.

Andres

#4Dave Page
dpage@pgadmin.org
In reply to: Pavel Stehule (#2)
Re: generic options for explain

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: Dave Page (#4)
Re: generic options for explain

On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I don't understand - why do we want to switch from a command interface
to a functional interface? I was thinking:

EXPLAIN (format 'xml') query...
EXPLAIN (format 'json') query...

What you're proposing will certainly be harder to code as well as more
different from what we have now. My approach has the advantage of
being able to leverage the main parser to parse both the query and the
options; with this approach, you'll need to pass the query text back
to the main parser and then write separate code of some sort to parse
the options. Seems like extra work for no gain.

The only possible advantage I can see of a functional interface is
that the current implementation of EXPLAIN is treated as a utility
command, so you can't do something like this:

INSERT INTO foo (EXPLAIN SELECT 1);

In my mind, however, fixing that would be preferable to (and probably
easier than) changing the whole syntax.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

Well, the major benefit of the existing EXPLAIN output is that you
don't need an XSLT stylesheet to read it. You can just be sitting
there in psql and do an EXPLAIN, and look at the results, and solve
your problem. As we add options to EXPLAIN, I would like to see us
support those in both the plaintext output and the XML output (and
hopefully the JSON output) using the same syntax.

If, for example, I need more information on what my hash joins are
doing, I'd like to be able to do:

EXPLAIN ('hash_detail', 'on') query...

...and have it tell me the # of buckets and batches for each hash join
and whatever other information might be useful in that context. I
*don't* want to have the solution to that problem be: run
explain_xml() on your query, then install an XSLT parser, then use
this pregenerated XSLT stylesheet to extract your data from the XML
document you got back from step #1, then view the resulting HTML in
your web browser. That may be a good solution for someone, but it
certainly doesn't sound convenient for anyone who works primarily at
the command line.

It's also worth noting that there are some kinds of statistics (for
example, I/O statistics) which can't be gathered without significantly
impacting the performance of the query. So you can't just take the
approach of turning on every possible instrumentation when XML output
is requested.

I do agree with your point that even with an extensible options
syntax, we can't just add an option for any old thing that someone
wants. We should try to come up with a somewhat comprehensive list of
types of instrumentation that someone might want and then group them
into categories, with one toggle per category.

...Robert

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#5)
Re: generic options for explain

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I don't understand - why do we want to switch from a command interface
to a functional interface?  I was thinking:

EXPLAIN (format 'xml') query...
EXPLAIN (format 'json') query...

Personally, I thing, so this format is strange - and is some MySQLism.
There are statements like SHOW TABLES and similar. My proposal doesn't
change of current EXPLAIN. The sence is don't more complicate current
syntax and do things simply. Function has more advantage then one:

a) don't need changes in parser
b) simply add new variants
c) simply add new parameters

What you're proposing will certainly be harder to code as well as more
different from what we have now.  My approach has the advantage of
being able to leverage the main parser to parse both the query and the
options; with this approach, you'll need to pass the query text back
to the main parser and then write separate code of some sort to parse
the options.  Seems like extra work for no gain.

The only possible advantage I can see of a functional interface is
that the current implementation of EXPLAIN is treated as a utility
command, so you can't do something like this:

INSERT INTO foo (EXPLAIN SELECT 1);

In my mind, however, fixing that would be preferable to (and probably
easier than) changing the whole syntax.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

Well, the major benefit of the existing EXPLAIN output is that you
don't need an XSLT stylesheet to read it.  You can just be sitting
there in psql and do an EXPLAIN, and look at the results, and solve
your problem.  As we add options to EXPLAIN, I would like to see us
support those in both the plaintext output and the XML output (and
hopefully the JSON output) using the same syntax.

If, for example, I need more information on what my hash joins are
doing, I'd like to be able to do:

EXPLAIN ('hash_detail', 'on') query...

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

Pavel

Show quoted text

...and have it tell me the # of buckets and batches for each hash join
and whatever other information might be useful in that context.  I
*don't* want to have the solution to that problem be: run
explain_xml() on your query, then install an XSLT parser, then use
this pregenerated XSLT stylesheet to extract your data from the XML
document you got back from step #1, then view the resulting HTML in
your web browser.  That may be a good solution for someone, but it
certainly doesn't sound convenient for anyone who works primarily at
the command line.

It's also worth noting that there are some kinds of statistics (for
example, I/O statistics) which can't be gathered without significantly
impacting the performance of the query.  So you can't just take the
approach of turning on every possible instrumentation when XML output
is requested.

I do agree with your point that even with an extensible options
syntax, we can't just add an option for any old  thing that someone
wants.  We should try to come up with a somewhat comprehensive list of
types of instrumentation that someone might want and then group them
into categories, with one toggle per category.

...Robert

#7Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#6)
Re: generic options for explain

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

...Robert

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#7)
Re: generic options for explain

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

I prefere little bit different syntax for named params like param =
value, or param => value, but syntax with AS is one confirmed.

See to me THAT is a really strange syntax, so I guess we need some more votes.

ok

Show quoted text

...Robert

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#7)
Re: generic options for explain

Robert Haas wrote:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

set explain_format = 'xml, verbose';
explain select * from foo;

No new function or syntax would be required.

cheers

andrew

#10Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#9)
Re: generic options for explain

On Sun, May 24, 2009 at 11:57 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).
select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more
votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

  set explain_format = 'xml, verbose';
  explain select * from foo;

No new function or syntax would be required.

Well, then you have to issue two commands to do one thing. I don't
much like the practice of using GUCs to control behavior that you may
only want for the duration of one statement.

The fundamental problem here, at least as it seems to me, is that we
want more options for EXPLAIN, but the current syntax won't support
it, because it requires making everything a keyword (and fixing the
order). So we can either add enough punctuation to de-confuse the
parser (which is what I tried to do) or we can switch to a totally
different method of controlling EXPLAIN behavior (as you and Pavel are
advocating).

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose. And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it? Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again. Blech!

My initial thought was to just use a list of key words for the
parameters, like this:

EXPLAIN (XML, VERBOSE) query...

...but I decided that wasn't a great idea, because it means that every
parameter has to be a boolean, which is probably more limiting than we
want to be. It also gets confusing because some parameters (like
"XML" and "JSON") are incompatible while others (like "HASH_DETAIL"
and "MERGE_DETAIL") can be used together, but there's not a lot of
clue in the syntax itself that this is the case. On the other hand,
if you write:

EXPLAIN (FORMAT 'XML', FORMAT 'JSON') ...
EXPLAIN (HASH_DETAIL 'ON', MERGE_DETAIL 'ON') ...

...it's obvious that the first case is qualitatively different from
the second one. I think this makes it easier to code the behavior in
a way that is consistent and correct, and also easier to document for
the benefit of our users. Now, we can do this part of it even if we
ultimately end up with a different overall syntax, for example:

explain_query(...query..., 'xml' as format, true as hash_detail)

I do suspect this will end up being a more complicated patch, and as I
said before, I don't really see the point.

As a point of historical interest, way back when (early 1997), we used
to support this syntax:

EXPLAIN WITH [COST|PLAN|FULL] query...

...but of course that's now impossible because WITH can be the first
word of a select query.

...Robert

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose. And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it? Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again. Blech!

You know about SET LOCAL, no? I don't think this argument is very
convincing.

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here. I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable. What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: generic options for explain

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose.  And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it?  Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again.  Blech!

You know about SET LOCAL, no?  I don't think this argument is very
convincing.

I completely fail to see how that helps me. It's not faster or fewer
commands to type:

begin transaction; set local whatever; explain blah; commit transaction;

than it is to type:

set whatever; explain blah; set oldwhatever;

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here.  I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable.  What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.

It's not just about me; we've had many requests for new EXPLAIN features.

Personally, I'd like to see the number of buckets and batches that a
hash join uses (predicted and actual), and maybe (with ANALYZE) the
number of hash collisions. I'd like to see memory utilization
statistics (peak memory utilization of hash join, for example). Other
people have requested I/O statistics (which you objected to on the
grounds that it would be too much overhead, so clearly if we're ever
going to do it it will have to be optional), and most recently number
of tuples discarded by the filter condition. We've also had requests
to suppress some information (like costs) for planner regression
testing.

I really don't see the point in restricting the syntax of EXPLAIN in
this way. I don't know exactly what sorts of useful options people
will come up with, but I do know that as long as we have an extremely
limiting options syntax, they can all be shot down on the grounds that
including them in the default output is too cumbersome for regular use
(which is absolutely true). On the other hand, I think it's
incredibly naive to suppose that EXPLAIN XML is going to make anyone
very happy. There are only two ways this can work out:

1. We'll make EXPLAIN XML output everything and the kitchen sink. In
this case, we'll soon get complaints along the lines of: "I can't use
regular EXPLAIN because it doesn't include the field that I need, but
the output of EXPLAIN XML is so voluminous that I can't read through
it by hand."
-or-
2. We'll be very restrictive about adding fields to EXPLAIN XML just
as we are now for plain EXPLAIN, in which case we haven't solved
anything.

I think XML output format is a complete distraction from the real
issue here, which is that there are certain pieces of information that
are sometimes useful but are not useful enough to justify including
them in the EXPLAIN output 100% of the time. By just punting all that
stuff to EXPLAIN XML, we're just saying that we're not interested in
creating a workable set of options to allow users to pick and choose
the information they care about - so instead we're going to dump a
huge chunk of unreadable XML and then make it the user's problem to
find a tool that will extract the details that they care about. Boo,
hiss.

Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
I would like to make the EXPLAIN syntax more powerful for command-line
use, and I'd implement XML format and JSON along the way just for
completeness. But I don't have much interest in creating an XML
output format that is the ONLY way of getting more information,
because I'm a command-line user and it does me no good at all. :-(

...Robert

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You know about SET LOCAL, no? I don't think this argument is very
convincing.

I completely fail to see how that helps me.

Mainly, what it does is avoid having to know exactly what the old
setting was.

regards, tom lane

#14Joshua Tolley
eggyknap@gmail.com
In reply to: Andrew Dunstan (#9)
Re: generic options for explain

On Sun, May 24, 2009 at 11:57:13AM -0400, Andrew Dunstan wrote:

Robert Haas wrote:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

set explain_format = 'xml, verbose';
explain select * from foo;

No new function or syntax would be required.

A further possibility: Oracle's equivalent of EXPLAIN doesn't actually output
anything to the screen, but rather fills in a (temporary?) table somewhere with
details of the query plan. I mostly found this irritating when working with
Oracle, because each time I used it I had to look up an example query to
generate output like PostgreSQL's EXPLAIN, which is generally what I really
wanted. But since we'd still have the old EXPLAIN behavior available, perhaps
something such as an Oracle-like table filler would be useful.

Such a proposal doesn't answer the need to allow users to specify, for
performance and other reasons, the precise subset of statistics they're
interested in; for whatever it's worth, my current favorite contender in that
field is EXPLAIN (a, b, c) <query>.

- Josh / eggyknap

#15Greg Smith
gsmith@gregsmith.com
In reply to: Pavel Stehule (#2)
Re: generic options for explain

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#15)
Re: generic options for explain

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information. We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

regards, tom lane

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: generic options for explain

On Sun, May 24, 2009 at 4:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You know about SET LOCAL, no?  I don't think this argument is very
convincing.

I completely fail to see how that helps me.

Mainly, what it does is avoid having to know exactly what the old
setting was.

Ah, OK, I see what you're going for. That's not really what I'm
complaining about with that syntax, though....

...Robert

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: generic options for explain

On Sun, May 24, 2009 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1.  The incremental approach here should first be adding functions that
actually do the work required.  Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser.  Starting with the parser changes
rather than the parts that actually do the work is backwards.  If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are.  So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information.  We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

The current output format basically prints out the node type and then
a collection of properties that are associated with that node, where
applicable: join type, scan direction, strategy, relation, alias,
startup cost, total cost, rows, width, loops, filter, join filter,
various types of condition (index/recheck/tid/merge/hash), sort key.
However, we tend to omit certain fields (particularly scan direction
and alias) when they contain information that isn't sufficiently
interesting. That's probably not a good idea for anything that's
intended for machine-parsing; I think for XML or JSON or
output-to-a-table we'd want to include those fields whenever possible.

With that minor complaint (and the difficulty of figuring out how to
avoid duplicating large portions of explain.c), XML or JSON output
doesn't seem that difficult. In JSON there aren't too many ways of
doing this that make sense. I would guess we'd want something like
this:

{ "node" : "Hash Join", "jointype" : "Inner", "startup_cost" :
"11.49", "total_cost" : "92.59", "rows" : "1877", "width" : "325",
'outer' : { "node" : "Seq Scan", <more stuff> }
'inner' : { "node" : "Hash", <more stuff> }
}

XML, being XML, has 10 ways of doing something when 1 is more than
sufficient, so there are a couple of ways we could go. I think this
is my favorite.

<HashJoin jointype="inner" startup_cost="11.49" total_cost="92.59"
rows="1877" width="325">
<SeqScan ... />
<Hash ...>
<SeqScan ... />
</Hash>
</HashJoin>

or you could just use one node type:

<node type="HashJoin" jointype="inner" startup_cost="11.49"
total_cost="92.59" rows="1877" width="325">
<node type="SeqScan" ... />
<node type="Hash" ...>
<node type="SeqScan" ... />
</node>
</node>

...and then there's this style:

<HashJoin>
<jointype>inner</jointype>
<startup_cost>11.49</startup_cost>
...
<outer/>
...
</outer>
<inner>
...
</inner>
</HashJoin>

...which is incredibly long and redundant, but some people who use XML
like such things. I'm sure there are other possibilities as well.

With respect to table output things are a little bit less
straightforward. There are two issues. With XML or JSON, any
properties that aren't relevant to a particular node can be omitted
altogether, whereas for a table the column list has to be consistent
throughout. We can just leave the unused columns as NULL, but it will
be a fairly wide table. Also, with apologies to Josh Berkus, we'll
need to introduce some sort of non-natural primary key to allow
children to be related to parents, because unlike XML and JSON,
there's no built-in way to make one object the child of another.

All that having been said, making EXPLAIN into a query (so that you
can do INSERT INTO foo (EXPLAIN SELECT query...) might be useful to
some people even without changing the output format at all. I think
you could throw a windowing function in there to at least tag each
line with its original position in the output, and some people might
want just that.

...Robert

#19Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#15)
Re: generic options for explain

On Sun, May 24, 2009 at 6:05 PM, Greg Smith <gsmith@gregsmith.com> wrote:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1.  The incremental approach here should first be adding functions that
actually do the work required.  Then, if there's a set of those that look to
be extremely useful, maybe at that point it's worth talking about how to
integrate them into the parser.  Starting with the parser changes rather
than the parts that actually do the work is backwards.  If you do it the
other way around, at all times you have a patch that actually provides
immediate useful value were it to be committed.

Well, perhaps I ought to be asking what sort of features people would
like to see, other than variant output formats? Maybe if we can
develop some kind of wish list for EXPLAIN, it will become more
obvious what the best option syntax is.

...Robert

#20Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#16)
Re: generic options for explain

On Sun, 24 May 2009, Tom Lane wrote:

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are.

Fair enough; I think Pavel and myself were presuming an implied "line
number" in the output there that, as you point out, doesn't actually
exist. The actual implementation detail there doesn't really change the
spirit of what I was trying to suggest though: that this feature should
get designed starting with the expected output and how to generate it,
then work backwards towards how you pass it parameters.

On that topic, I though Tom Raney was the latest to update code for the
XML output at least. Code and a presentation going over everything is at
http://web.cecs.pdx.edu/~raneyt/gsoc/ and there's a video at
http://www.postgresqlconference.org/2008/west/talks/

A note about that got sent to this list at one point but I don't see any
follow-up:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00802.php

Anybody have a better idea of what happened with that project than me?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#21Joshua Tolley
eggyknap@gmail.com
In reply to: Tom Lane (#16)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#16)
#23Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#12)
#24tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Dimitri Fontaine (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#23)
#26Joshua Tolley
eggyknap@gmail.com
In reply to: Robert Haas (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua Tolley (#21)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#25)
#29Joshua Tolley
eggyknap@gmail.com
In reply to: Tom Lane (#27)
In reply to: Tom Lane (#28)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua Tolley (#29)
#32David Fetter
david@fetter.org
In reply to: Tom Lane (#28)
#33Tom Raney
twraney@comcast.net
In reply to: David Fetter (#32)
#34Michael Glaesemann
grzm@seespotcode.net
In reply to: Joshua Tolley (#21)
#35Joshua Tolley
eggyknap@gmail.com
In reply to: Tom Lane (#31)
#36Andres Freund
andres@anarazel.de
In reply to: Tom Raney (#33)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#41)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#28)
#44Magnus Hagander
magnus@hagander.net
In reply to: Peter Eisentraut (#43)
#45Dave Page
dpage@pgadmin.org
In reply to: Peter Eisentraut (#43)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#43)
#47Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#43)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#47)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#43)
#50Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#49)
#51Dave Page
dpage@pgadmin.org
In reply to: Andrew Dunstan (#50)
#52Magnus Hagander
magnus@hagander.net
In reply to: Dave Page (#51)
#53Peter Eisentraut
peter_e@gmx.net
In reply to: Dave Page (#51)
#54Magnus Hagander
magnus@hagander.net
In reply to: Peter Eisentraut (#53)
#55Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Eisentraut (#53)
#56tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Magnus Hagander (#52)
#57Aidan Van Dyk
aidan@highrise.ca
In reply to: tomas@tuxteam.de (#56)
#58Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Peter Eisentraut (#53)
#59tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Aidan Van Dyk (#57)
#60Joshua Tolley
eggyknap@gmail.com
In reply to: Dave Page (#51)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#52)
#62Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#62)
#64Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#63)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#64)
#66Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#64)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#65)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#68)
#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#69)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#70)
#72Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#71)
In reply to: Magnus Hagander (#52)
#74Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#72)
#75Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#74)